Introduction to Big Data and Predictive Analytics

What is Big Data?

a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Wikipedia

First Big Data Problem

Information Overload

  • The 1880 US Census took 8 years to tabulate
  • Estimated that the 1890 Census would take 14 years!

Hollerith Tabulating Machine

  • Using punch cards to record information
  • Used electromechanical relays to increment mechanical counters
  • Could count, sort, and set off a bell when card had been read
  • it "tamed" the big data
  • Census was completed in one year
  • ...finishing months ahead of schedule and far under budget.

Big Data Timeline

For more history on Big Data check out:

http://www.winshuttle.com/big-data-timeline/

About Big Data

Provide opportunities to find insights into new and emerging types of data.

IBM

Holds the promise of giving enterprises deeper insights into your customers, partners, and business.

Oracle

Characteristics

Four V's of Big Data

  1. Volume => The amount of data determines value and potential of data.
  2. Velocity => Speed at which data is generated and processed.
  3. Variety => Different types of data is widely available.
  4. Value => How much is it worth?
  • Variability / Veracity: How good is the data? Various levels of uncertainty and reliability
  • Complexity: Data comes from multiple sources, needs to be linked, connected and correlated to make sense of the data.

Challenges with Big Data

Technical Challenges:

  • Capturing, Curating, & Analysis of data
  • Storage, & Sharing/Transferring
  • Searching, Visualizing

Personal Challenges

  • What is the right approach?
  • How is your doing being perceived
  • Data Privacy / Regulations
  • Be careful with who you share your data with
  • Understanding challenges
    Challenges are always opportunities!

Companies are ignoring large chunks of incoming data:

  • Technology is primitive, resources are scarce
  • Limited capacity for scanning and interpreting all the data coming in.
  • As result, data is simply not getting processed. Leading to losses in opportunity

Leveraging Big Data

Sectors benefiting from Big Data technologies:

  • Government: predict and plan for civil unrest
  • Health Care: data collection on patients, predict illness and help caregivers make better decisions
  • Farming: accurately forecast bad weather and crop failures
  • Science & Research: drive innovation
  • Enterprises and Businesses: make better bussiness and marketing decisions

Big Data Technologies

Techniques:

  • A/B Testing
  • Crowdsourcing
  • Data fusion and integration
  • Genetic Algorithms
  • Machine Learning
  • Natural Language Processing
  • Signal Processing
  • Simulation
  • Time series analysis
  • Visualisation

Solutions:

  • MapReduce
  • Column-oriented databases
  • Schema-less database (NoSQL databses)
  • Hadoop
  • Hive
  • PIG

Hadoop

Leading Platform for Big Data Analytics

http://hadoop.apache.org/

Open Source software for Reliable, Scalable, Distributed Computing.

Consists of:

Cloud Based Infrastructures

Cost effective, scalable cloud based technologies:

Amazon Web Services - Big Data

http://aws.amazon.com/big-data/

Rackspace - Big Data powered by Apache Hadoop

http://www.rackspace.com/big-data/

Google Cloud Platform - Big Query

https://cloud.google.com/bigquery/

What is Predictive Analytics?

The practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends.

Webopedia

About Predictive Analytics

Improving Business Intelligence

Driving Deeper into the Individuals

Technology that learns from experience to predict the future behaviour of individuals in order to drive better decisions.

Imagine understanding the:

  • Potential needs
  • Habits
  • Purchases
  • Responses

...of each individual customer.

It can allow you to deliver a personalized experience based on that knowledge.

Working with Predictive Analytics

Implementing Predictive Analytics

Predictive Analytics - Models & Use Cases

Models can predict customer attributes and behaviour.

Using probability, models can anipate outcomes.

Very Powerful and very Profitable.

Examples of Uses:

  • Direct Marketing
  • Customer Retention
  • Fraud Detection
  • Risk Management
  • Clinical Decision Support Systems

Building Adaptive Apps

Optimizing service to the user based on what you know about the customer/user.

Design Principles

  • Learning who the customer really is
  • Detect customer's intent in the moment
  • Morph functionality and match content
  • Optimize for device and delivery

Age of the customer

To treat like Royality = To treat like individuals

Treat like Royality to get their Loyality

Why Use Predictive Analytics?

It's better than guessing

Don't predict the future.

Influence it!

Thank You

/