Natural Language Processing for Digital Games Section2: Deep Learning and NLP

May 31, 2023 3:00 PM

- Deep Learning and Natural Language Processing

These days, when the term "AI" is used in general, it seems that the word is used for referring to "Deep Learning (DL)" based technologies. DL is the technology to have a neural network with many layers (called "deep") learn from a vast amount of training data and automatically find the patterns from the data. It has outperformed conventional methods in many areas, such as audio signal processing, computer vision, and natural language processing.

As discussed in the previous article, Natural Language Processing (NLP) had a huge jump in the last decade. This progress relies primarily on DL, which is shown in the figure below as an overlap area between NLP and DL.

- The 2010s: The dawn of natural language processing using deep learning

Word2vec [1-3], which appeared ten years ago, is one of the breakthroughs leading to today's NLP. The fact that neural network-based methods could efficiently learn the meanings of words was important in the advancement of NLP with DL (Note 1). NLP progressed mainly in the machine translation topic, and representative works were published, such as Sequence-to-sequence [4] in 2014 and Transformer [5] in 2017. Then, the emergence of "large-scale pre-trained models" based on Transformer, such as BERT [6] and GPT-2 [7], marked a significant turning point. It was shown that high performance could be achieved by "fine-tuning" models that had been "pre-trained" on large amounts of text data for various tasks.
Until then, it was common to train a dedicated model from scratch for each task to be solved, such as translation or summarization (Note 2). However, this has changed drastically. If one large model is created, it can be used for various tasks. Expectations grew that "if we build a larger model, it will be more versatile and perform better in a variety of tasks." Various research organizations began to compete in scaling up their models.

At the beginning of this section, I mentioned that DL had realized significant performance improvements in various fields, but this is not the only reason why DL has attracted so much attention. While conventional machine learning methods require humans to design the "features" that computers should pay attention to, DL has computers design such features themselves during training. This has dramatically lowered the barrier to entry into AI research & development, which is another attraction of DL and why many people are now engaged in AI using DL.

In the 2010s, there were such significant changes as the introduction of DL into NLP, the emergence of the Transformer and its development, and so on. I hesitate to shortly label them as "the 2010s" because so many things happened. And this trend has continued in the 2020s, the decade in which we live now.

- The 2020s: Emergence of the star performer

Following the trend in the late 2010s toward larger Transformer-based pre-trained models, GPT-3 [8] was introduced in 2020. GPT-3, which is sometimes regarded as the origin of today's so-called "Large Language Models (LLMs)," improved dramatically over its predecessor, GPT-2. GPT-3 was received with great surprise, as it could handle various tasks even without fine-tuning. The size of the model (number of parameters) has also increased more than 100 times compared to GPT-2 (Note 3). Still, this alone cannot explain this "emergent" performance improvement, and many researchers are interested in this phenomenon. And as you know, ChatGPT [9] was announced in 2022 and is a topic of daily discussion.

DL and NLP, which evolved rapidly in the 2010s, continue to change even more quickly in 2023. Since the advent of the Transformer, Recurrent Neural Networks (RNNs), which were used before it, have been winding down. This year, however, RWKV, a large-scale language model using RNNs, made a big splash. At this point, we can say that the Transformer-based large-scale language model is the "star performer," but with research and technology from just a few months ago becoming "old," there is no guarantee that there will not be dramatic changes in a few months, or between the time of this article's writing and its publication. And the source of that change may be something that people currently overlook as "old" technology.

- Is Deep Learning All You Need?

Let's look back again at the figure in this section: technologies using DL are only part of AI and NLP, and there are AI and NLP that do not use DL. There was a time when rule-based AI, created by many rules prepared by humans, received particular attention.
While DL boasts high performance, it is difficult for humans to understand "why this result is obtained" because the process of computation by an extensive network becomes a black box (Note 4). When it is vital to know "why the result is the way it is," such as in the case of digital games where the flag management of the scenario needs to be controlled as intended by the game designer, it may be preferable to use a rule-based method where the processing process is transparent.

In this section, we overviewed natural language processing using deep learning.
In the next section, I would like to talk about how much impact Word2vec and other "semantic vector" technologies have had and are developing.

[Note 1] Word2vec included two different proposals: skip-gram and CBOW. Hence, I wrote "methods" instead of "method".

[Note 2] Strictly speaking, it should probably be written "[almost] from scratch," since some efforts were made, such as using pre-trained word vectors as a support. In fields such as image recognition, fine-tuning that treated models such as VGGNet and ResNet as "pre-trained" models was already common at that time.

[Note 3] The GPT series includes several sizes in the same generation. Here we refer to the GPT-2 with 1.5B parameters and the GPT-3 with 175B parameters. These are generally intended when referring to GPT-2 and GPT-3, respectively.

[Note 4] In response to the issue of the DL computation process being a black box, there is a research field called "Explainable AI," which aims to make the DL processing process explainable. In NLP, when BERT became a hot topic, "BERTology" was proposed to "clarify what exactly is happening in BERT. In this way, the attempt to understand the processing process of AI, which is becoming more complex as its performance increases, and to link it to safe utilization, is attracting interest as one of the important research fields.

References
[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient Estimation of Word Representations in Vector Space." In Proceedings of Workshop at ICLR, 2013.
[2] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. "Distributed Representations of Words and Phrases and their Compositionality." In Proceedings of NIPS, 2013.
[3] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. "Linguistic Regularities in Continuous Space Word Representations." In Proceedings of NAACL HLT, 2013.
[4] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. "Sequence to Sequence Learning with Neural Networks." In Proceedings of NIPS, 2014.
[5] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention Is All You Need." In Proceedings of NIPS, 2017.
[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." In Proceedings of NAACL-HLT, 2019
[7] Radford, Alec, Jeff Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. "Language Models are Unsupervised Multitask Learners." (2019).
[8] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. "Language Models are Few-Shot Learners." In Proceedings of NeurIPS, 2020.
[9] OpenAI, "Introducing ChatGPT." (Blog), 2022.