The Evolution of Natural Language Processing

Mar 30, 2022

In our last article, we laid out a definition for Natural Language Processing (NLP) and touched on ways many of us have benefited, either individually or as brands, from this exciting technology. For this article, we’ll look at how NLP evolved to be integrated into myriad areas of business.

Natural Language Processing has its roots in the early 20^th century with the Swiss linguist Ferdinand Saussure. Saussure’s ideas laid the foundation for Structuralism. Boiling this concept down to its simplest form, it means any language (English, Klingon, Java Script) derives meaning because of its interwoven rules. In essence, there is a structure to the language that allows for understanding based on shared social norms. Think about it like this: in English, a common sentence might be: The boy threw the ball (Subject, verb, object) as opposed to Threw the ball did the boy. In theory, both can be reduced down to the same meaning, but one of these sentences is so foreign to many English-speaking ears, that it sounds almost alien.

Language is grounded in structures (grammatical structures like subject, verb, object; and others) that allow for common or shared understanding.

In the 1950s, linguists such as Alan Turing and later Noam Chomksy, began to think about how this Structuralist approach to language could be applied to computers – so that computers could learn the structures of language in the same way humans do, as if they’re elementary school students. NLP has been evolving ever since.

The first phase of NLP is Symbolic NLP, dated from the 1950s-1990s. In its infancy, NLP was applied primarily in academia, with linguists feeding already-defined rules to computers to see how they would process language. These early machine translation systems were purely symbolic, based on pre-determined rules that would convert forms of language into another language (think of these as proto-version of Babelfish or the more sophisticated Google translate).

As computing power advanced through the 90s, a second phase developed referred to as Statistical NLP, dated from roughly 1990-2010. In general, the 90s were a big decade for computing-related events; Windows 95 & JavaScript were developed, DVDs were introduced, and Yahoo, eBay and Amazon all launched. NLP was no different. Exponential increases in computational power allowed new algorithms including Machine Learning to take form. Computers became much faster and could be used to develop rules based on linguistic statistics without a linguist developing the rules. The discipline transitions from realm of linguists in academia to engineers, and we begin to see the adoption of NLP in everyday life where it was employed to automate mundane, simple tasks.

As the model has become more sophisticated and able to draw from larger data sets, it has been growing in adoption as means to make language-based tasks more efficient. Common early applications include Spell & Grammar Check. Other applications included autocomplete in a search engine and Babelfish.

Currently, from roughly 2010 through present day, Neural Net NLP models are the dominating approach for NLP tasks. Artificial Neural Networks (ANNs) are computing systems inspired by the biological neural networks in animal brains. Here we have machine learning techniques applied to NLP, ultimately made possible because there is so much existing data to train neural network models, as well as more powerful computer systems to do so. Applications include targeted advertising, sophisticated language translation, chat bots, voice assistants like Alexa and Siri, consumer insights and even e-mail spam filtering.

As a tool, NLP can be leveraged across a wide array of applications, dependent on the industry and needs. In our next post, we’ll talk specifically how about Natural Language Processing can be applied to consumer insights in user-generated content.