Transcript Big Data

Big Data
What is Big Data?
• https://www.youtube.com/watch?v=c4BwefH5Ve8
• Big Data Analytics: 11 Case Histories and Success Stories
• https://www.youtube.com/watch?annotation_id=annotation_3535169775&f
eature=iv&src_vid=c4BwefH5Ve8&v=t4wtzIuoY0w
Big Data
• Data Size:
– Gigabyte
– Terabyte: Terabyte USB
– Petabyte: Wal-Mart handles more than 1m
customer transactions every hour at more than 2.5
petabytes
– Exabyte: the amount of traffic flowing over the
internet about 700 exabytes annually
– Zettabyte
•
Big Data: Some Facts
• World’s information is doubling every two years
• World generated 1.8 ZB of information in 2011
• Cisco predicts that by 2016 global IP traffic will reach 1.3
zettabytes
• There will be 19 billion networked devices by 2016
• 70% of this data is being generated by individuals as opposed
to enterprises & organizations
Big Data Sources
•
•
•
•
•
•
Web sites
Social media
Machine generated
RFID
Image, video, and audio
Etc.
Big Data Challenges
• Big Data are high-volume, high-velocity,
and/or high-variety information assets that
require new forms of processing to enable
enhanced decision making, insight discovery
and process optimization.
• “3Vs":
– Volume: Size >= 30-50 TBs
– Velocity: Processing speed
– Variety:
• Structured: able to fit in a database table
• unstructured data
Do Companies care about Data?
• Not really, What they care about are Key
• Performance Indicators (KPIs)
• Some examples of KPIs are
– Revenue
– Profit
– Revenue per customer/employee
– Customer Attrition: the loss of clients or customers
• Big Data is only useful if it helps drive KPIs
Big Data to KPIs
Applications
• Text mining: deriving high-quality information
from text.
– text categorization, text clustering, concept/entity
extraction, sentiment analysis, etc.
• Web mining:
– Web usage mining
– Web content mining
• Social media mining
– Salesforce Radian6 Social Marketing Cloud
• http://www.youtube.com/watch?v=EH1dcFh_-I4
Hadoop HDFS: Hadoop Distributed File System
• O"Imagine you had a file that was larger than your PC's
capacity. You could not store that file, right? Hadoop lets you
store files bigger than what can be stored on one particular
node or server. So you can store very, very large files. It also
lets you store many, many files.“
Hadoop: MapReduce
• “rather than take the conventional step of moving data over a
network to be processed by software, MapReduceuses a
smarter approach tailor made for big data sets.”
• “…rather than move the data to the software,
MapReducemoves the processing software to the data.”
(InfoWeek)
NoSQL Database
• NotOnlySQL is a broad class of database management systems
identified by non-adherence to the widely used relational
database management system model.
• They are useful when working with a huge quantity of data
when the data's nature does not require a relational model.
In-Memory Database
• An in-memory database is a database
management system that primarily relies on
main memory for computer data storage. It is
contrasted with database management systems
that employ a disk storage mechanism.
• Main memory databases are faster than diskoptimized databases.
• Good for Big Data analytics.
• Use non-volatile memory module that retains
data even when electrical power is removed.
SAP HANA
• High-Speed Analytical Appliance (HANA), uses a technique
called sophisticated data compression to store data in the
random access memory. HANA's performance is 10,000 times
faster when compared to standard disks, which allows
companies to analyze data in a matter of seconds instead of
long hours. (Techopedia)