Understanding Sentiment Analysis of Kabali & other Movies via Hadoop

Understanding Sentiment Analysis of Kabali & other Movies via Hadoop

 

So the movie “Kabali” is released and with its release it’s making headlines everywhere. While some of the companies have already declared a holiday to mark Thalaiva’s popularity, filmy pundits have started predicting that the movie will break all the records.

 

Have you ever thought how do experts predict a movie’s success? Or how does a movie create massive buzz? Well, to your information let me make some revelations. Social media sites twitter has made youngsters hashtag addict that has created new avenues for businesses of all types. A huge amount of data called big data is created per day, which is so massive that cannot be mined using simple technology.

 

This is where Apache Hadoop comes handy. It helps predict trends, look into consumer opinions and make real-time evaluations based on those unstructured data. Sentiment analysis is a part of this trend that helps people understand which movie or TV show is going to be a big hit or which is the most popular one trending.

 

Below are the ways that decipher this technology:

 

Data collection

Data collection is the most important yet initial phase. NiFi is the best way to collect data as it helps you not just in collection but aggregation and movement of the large amounts of steaming event data. It enables applications to gather data from the source and transfer it to HDFS for analysis. In terms of tweets it offers a free streaming API that helps to retrieve content and send it to HDFS.

 

If you wanna know how it works, then here is the theory-

 

First of all a stream is started in NiFi from the twitter client that sends a single unit of data to a Source that works within the Agent. The source that gets this “event” then offers it to one or more channels- the channel between the source & the sink. More than one sinks working within the same Agent depletes these channels.

 

Data labeling

It’s the most business specific phase. The words are identified that are business relevant to build a data dictionary and to attribute to words as well as expressions a positive or neutral/ negative polarity. Hadoop inserts customizable catalogues as well as dictionary tables to assist you in this work.

 

Apache HCtalogu- a table management layer that unveils the Hive metadata to other Hadoop apps is particularly useful as it offers a relational view of the data. It arranges unstructured tweets for easy management.

Running the analytics

 

Using the Hadoop, get the sentiments of the tweets via number of positive words against number of negative words in every tweet. Now the data is present in HDFS that can be arranged in tables in Hive.

 

Train, adapt and update

Once you’re arrived at this point, you’ll achieve the first results and you’ll be capable of proceeding and fine-tuning it. If there is important context missing, then the analytic tools that just dig for positive or negative words can generate misleading results. The major hindrances are typos, emoticons, intentional misspellings and jargons.

 

Computers also fail to understand irony, sarcasm and humour. If the tweets are containing too many of these, then the accuracy will be compromised. Hence, fine tuning your model is necessary.

 

Now obtain insights

When everything is done, just run some interactive queries in Hive to filter out the data and have visualization of it via BI tool.

 

This is just one way of collecting and analyzing social data using Hadoop. There are many other ways to analyze twitter or other social media sentiments to predict a film’s business with accuracy.

 

Get Weekly Free Articles

on latest technology from Madrid Software Training