Introduction to Big Data and Predictive Analytics
What is Big Data?
a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
Wikipedia
First Big Data Problem
Information Overload
- The 1880 US Census took 8 years to tabulate
- Estimated that the 1890 Census would take 14 years!
|
Hollerith Tabulating Machine
- Using punch cards to record information
- Used electromechanical relays to increment mechanical counters
- Could count, sort, and set off a bell when card had been read
- it "tamed" the big data
- Census was completed in one year
- ...finishing months ahead of schedule and far under budget.
|
About Big Data
- It's about generating value from very large data sets
- The amount of data is growing exponentially for many reasons:
- retailers building databases of customer activity
- logistics, financial, government, & health are capturing more data
- social media
Provide opportunities to find insights into new and emerging types of data.IBM
Holds the promise of giving enterprises deeper insights into your customers, partners, and business.Oracle
Characteristics
Four V's of Big Data
- Volume => The amount of data determines value and potential of data.
- Velocity => Speed at which data is generated and processed.
- Variety => Different types of data is widely available.
- Value => How much is it worth?
- Variability / Veracity: How good is the data? Various levels of uncertainty and reliability
- Complexity: Data comes from multiple sources, needs to be linked, connected and correlated to make sense of the data.
Challenges with Big Data
Technical Challenges:
- Capturing, Curating, & Analysis of data
- Storage, & Sharing/Transferring
- Searching, Visualizing
Companies are ignoring large chunks of incoming data:
- Technology is primitive, resources are scarce
- Limited capacity for scanning and interpreting all the data coming in.
- As result, data is simply not getting processed. Leading to losses in opportunity
Leveraging Big Data
- Get a 360 degree view of the customer
- Build new applications or improve effectiveness of existing applications
- Realize new sources of competitive advantage
- Increase customer loyalty
Sectors benefiting from Big Data technologies:
- Government: predict and plan for civil unrest
- Health Care: data collection on patients, predict illness and help caregivers make better decisions
- Farming: accurately forecast bad weather and crop failures
- Science & Research: drive innovation
- Enterprises and Businesses: make better bussiness and marketing decisions
Big Data Technologies
Techniques:
- A/B Testing
- Crowdsourcing
- Data fusion and integration
- Genetic Algorithms
- Machine Learning
- Natural Language Processing
- Signal Processing
- Simulation
- Time series analysis
- Visualisation
Solutions:
- MapReduce
- Column-oriented databases
- Schema-less database (NoSQL databses)
- Hadoop
- Hive
- PIG
Hadoop
Leading Platform for Big Data Analytics
http://hadoop.apache.org/
Open Source software for Reliable, Scalable, Distributed Computing.
- Distributes storage and processing of large data sets across clusters of server computers.
- Detects and compensates for hardware, or other system problems, at the application level.
Consists of:
- Distributed File System - High bandwidth storage
- MapReduce - Distributes or maps data across multiple servers
- Each server summarizes the data it receives
- Aggregated in reduced state, allowing meaning to be derived from raw data
What is Predictive Analytics?
The practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends.
Webopedia
About Predictive Analytics
- Encompasses statistical techniques for:
- Data Modeling
- Machine Learning
- Data Mining
- Analyzing current and historial facts to make predictons about the future
Improving Business Intelligence
Driving Deeper into the Individuals
Technology that learns from experience to predict the future behaviour of individuals in order to drive better decisions.
Imagine understanding the:
- Potential needs
- Habits
- Purchases
- Responses
...of each individual customer.
It can allow you to deliver a personalized experience based on that knowledge.
Working with Predictive Analytics
Implementing Predictive Analytics
Predictive Analytics - Models & Use Cases
Models can predict customer attributes and behaviour.
Using probability, models can anipate outcomes.
Very Powerful and very Profitable.
Examples of Uses:
- Direct Marketing
- Customer Retention
- Fraud Detection
- Risk Management
- Clinical Decision Support Systems
Building Adaptive Apps
Optimizing service to the user based on what you know about the customer/user.
- Adaptive apps anticipates custmomers
- Become their Digital Butler
Design Principles
- Learning who the customer really is
- Detect customer's intent in the moment
- Morph functionality and match content
- Optimize for device and delivery
Age of the customer
To treat like Royality = To treat like individuals
Treat like Royality to get their Loyality
Why Use Predictive Analytics?
It's better than guessing
Don't predict the future.
Influence it!
/