Author – Palash Jain
“Things like chatbots, machine learning tools, natural language processing, or sentiment analysis are applications of artificial intelligence that may one day profoundly change how we think about and transact in travel and local experiences.” Gillian Tans
Think about how easy it is to communicate with anyone. One could express his thoughts in a plethora of different ways, the most common of which is words and it is words that make up what we call languages. The word language can be interpreted in a variety of different ways but at its core, language can be defined as:
“The method of human communication, either spoken, signed or written, consisting of the use of words or symbols in a structured and conventional way.”
I think the key phrase in that definition is “in a structured and conventional way”. We as humans automatically understand this conventional structure of language without even having to think. Most of the time we even think using this structured conventional manner of communication. Computers on the other hand do not understand any of this. For them there probably isn’t a lot of difference between “there” and “their”. While we have learnt to communicate with a computer in its own base language binary and a variety of high level languages like java and python, communicating with a computer as if it were a human has only been in practice for a few years. The use of artificial intelligence and the advances in technology have made it possible for us to simply write/speak a sentence and the computer will understand what we want/said in an instant. This field of artificial intelligence and computer science that came into existence to plug the communication gap between machines and humans is what we call “Natural language processing” (NLP).
The advent of NLP has not only revolutionized the way we communicate with a computer, it has also found multiple uses in various other fields within Computer Science such as in data analytics and machine learning. NLP has made it possible in using hordes of unstructured textual data to mine insights from them or even perform predictive modelling.
In this series of blog posts we will cover the basic steps involved in a text analytics pipeline with a prime focus on the following:
- Loading, processing and understanding textual data
- Transforming the unstructured text data into forms suitable for data mining.
- Text parsing and Exploratory data analysis
- Word clouds, exploratory figures, Bag of words
- Text representation and Feature Engineering
- Bag of words, TF-IDF, Dimensionality reduction
- Modelling/ Pattern Mining
- Sentiment analysis
- Named Entity Recognition
- Topic Modelling
- Predictive modelling
- Evaluation/ Deployment
- BONUS: Web scraping – The most common way to acquire textual data by harnessing the information stored on web pages.
- BONUS: A review of some interesting ways NLP has been used in various domains.
So there we have it, if you thought that machine crunching numbers is interesting, wait till you see a machine crunching text. NLP is still a very challenging problem in the machine learning space and improvements over current methods are being made every week. Having said that, we believe that NLP based analytics should be a part of any business that collects, processes or stores textual data and hope to showcase its power through these series of blog posts.
Part II Coming Soon