Big Data Hadoop Case Study - How to do Analysis of Retail Wifi Log File with Hadoop

Big Data Hadoop Case Study - How to do Analysis of Retail Wifi Log File with Hadoop

Although, since a very long time we have been providing Big Data Hadoop training in Delhi to all aspirants seeking a career in this domain, we come up with relevant case studies time to time.

We have been doing some research work across various fields to help you guys understand how big data analysis takes place in Hadoop. So, here our experts are going to talk about a big data Hadoop case study. We have chosen the retail section (although there are many fields across which case studies can be done) as this will give you a broader idea of the concepts related to tracking users in the online sphere.

Thinking of a wider perspective, various sensors produce data. We kept a real store in mind and we figured out these sensors- customer frequency counters located at the doors, free WiFi access points, the cashier system, background music, temperature, video capturing, smells and more.

While many of the sensors needed additional hardware and software, a few sensors solutions are around, for example, face recognition or video capturing. Our experts found out that WiFi access points are the most interesting sensor data that do not need any additional hardware or software. Above that, many visitors have Wi-Fi-enabled smartphones. With these log files, we can easily find out answers of-

•    Unique visits

•    Total number of visits

•    Average visit duration

•    New and returning visitors

Possible answers that we got were-

Prior to designing a blueprint, we asked ourselves these questions on priority basis-

•    Who would be asked to answer such questions?

•    Who’s the person?

•    What tools the person is gonna use?

•    What skill set this person has?

•    How do they work?

Well, the one person that’s gonna answer all these questions is an analyst. They look after data warehouse, and typically they have business intelligence (BI), analysis along with report tool and access to the data warehouses. They typically give answers using SQL.

Further, we knew that getting a solution through a Big Data approach will involve a new person- data scientist.

How to answer these questions as a data scientist?

Well, the answer is pretty simple if we consider a high level of abstraction. A data management system is required with ingest, store and process.

With a Hadoop architecture, a data scientist, without any programming environment should be capable of answering these questions.

Setup

We used following ingredients:

•    2 WiFi access points to simulate two dissimilar stores with OpenWRT, a Linux-based firmware for routers, installed

•    A virtual machine serving as central Syslog daemon collecting all log messages from the WiFi routers

•    Flume to transfer all log messages to HDFS, with no manual intervention (no transformation, no filtering)

•    A 4 node CDH4 cluster running on virtual machines (CentOS, 100 GB HDD, 2 GB RAM), installed as well as monitored with Cloudera Manager

•    Pentaho Data Integration‘s graphical designer for data parsing, transformation, filtering plus loading to the warehouse (Hive)

•    Hive as data warehouse system on top of Hadoop to plan structure onto data

•    Impala for querying data from Hive in real time

•    Microsoft Excel to visualize the results

In order to collect some data for a period of four days, we fired up the two Wi-Fi routers.

Note- since Impala is a beta version, it supports only SELECT statements, hence creating new tables through CREATE statements from query results in Hive warehouse isn’t possible. With this limitation, we decided to copy and paste the results of the query into MS Excel so as to do analysis and visualization. Once Impala can run the CREATE tables query, a data scientist can have the access of all data from their analysis, BI and reporting tools.

You can get a broader ideas of Hadoop functionality by getting yourself enrolled in a reputed Big Data Hadoop institute in Delhi. You can go through various case studies to understand the big data management in Hadoop.

This is the buzzing solution around and having a sound knowledge in this simply means greater opportunities knocking your door for a desired career. Go get enrollment, analyse case studies and hone your skills!!!

 

 

 

 

Get Weekly Free Articles

on latest technology from Madrid Software Training