Large Data Sets Examples, Challenges, and Models

Download Report

Transcript Large Data Sets Examples, Challenges, and Models

Big Data,
Future of Computing,
Parting Thoughts
Slobodan Vucetic
Associate Professor
Department of Computer and Information Sciences
Temple University
Slides and pictures borrowed from:
rio.ecs.umass.edu/~lgao/ece697_11/01Overview.ppt
http://www.nsf.gov/attachments/124212/public/BIG-Data-Webinar-Honavar-Final-May8with508.pdf
Google Images
Name
(Symbol)
Value
kilobyte (kB)
103
megabyte (MB)
106
gigabyte (GB)
109
terabyte (TB)
1012
petabyte (PB)
1015
exabyte (EB)
1018
zettabyte (ZB)
1021
yottabyte (YB)
1024
Big Data
Reasons for the Emergence of Large Data
Sets:
Better technology
• Storage & disks
–
–
–
–
Cheaper
More volume
Physically smaller
More efficient
Large data sets are
affordable
4
Reasons for the Emergence of Large Data
Sets:
Better networking
• High speed Internet
• Cellular phones
• Wireless LAN
More data consumers
More data producers
5
Reasons for the Emergence of Large Data
Sets:
Better IT
• More processes are automatic
–
–
–
–
–
–
–
E-commerce and V-commerce
Online and telephone banking
Online and telephone customer service
E-learning
Chats, news, blogs
Online journals
Digital libraries
• More enterprises are computerized
–
–
–
–
Companies
Banks
Governmental institutions
Universities
6
More data is
available in
digital form
Reasons for the Emergence of Large Data
Sets:
Growing needs
• Science
–
–
–
–

Astronomy
Earth and environmental studies
Meteorology
Genetics
• Business
– Billing
– Mining customer data
Intelligence




Emails
Web sites
Phone calls
Search



Web pages
Images
Audio & Video
More incentive to construct
large data
sets
7
Big Data – Opportunities
• Big Data presents unprecedented opportunities to
– Accelerate scientific discovery and innovation
– Lead to new fields of inquiry that would not otherwise be
possible
– Improve decision making
– Understand human and social processes
– Promote economic growth
– Improve health and quality of life
8
Big Data – Science
Remote Sensing
Astronomy
Participatory
Sensing
• Air, Land, Ocean
• 100s GBs /day
Genomics
• 25K genes, 3B base pairs
• 8B humans
• thousands of organisms
• Sky surveys
• 120 GB/week, 6.5 TB/year
Drug Discovery
• 2M of compounds
• > 100M interactions
Big Data – Internet
Internet Traffic
Social Networks
Web
• 8 Billion pages
• 10kB/Page
• 8 TB of indexed text
Typical router:
• 42 bytes/second
• 3.5 Gigabytes/day
Mobile Apps
11
Big Data – Intelligent Transportation Systems
The future lies in integration, mining and analytics of BIG DATA
From the sky or space
From the ground
From the vehicles
Big Data and CIS
– Specific Challenges
• Data management, collection and storage
–
–
–
–
–
New data storage, I/O, architectures
Efficient archiving, storing, indexing, retrieving, and recovery
Privacy and security
Cloud computing
Languages, tools, methodologies and programming environments
• Data Analytics
– Data analytics under processing, memory, storage, energy constraints
– Scalable and interactive data visualization
– Extraction and integration of knowledge from massive, complex, multimodal, or dynamic data
– New algorithms, languages, data structures for data analytics
13
Big Data and Future of Computing
• Google web search and Google news search
– “big data”
– “big data computer science”
– “big data jobs”
– “big data future computing”
– “big data cloud computing”
Parting thoughts
• Take-home messages from CIS 1001
– CIS is a growing field entering its golden age
• Physical world will be increasingly driven by computers and
information technology
• Increased importance of virtual and augmented reality world
– There is no free lunch
• Work hard
– Get good GPA
– Broaden your skills and perspective
• Make smart choices
– Get internships
– Do undergraduate research
– Open a startup
• Use resources available to you
–
–
–
–
Temple advising and help desks
Professors, TAs, Colleagues
Family and friends
Web
Parting thoughts
• Web is the biggest knowledge source
– It contains all known wisdom, it is there for taking, use it for your advantage
– It can be a great time sink, there are many dead ends
• Some Pointers:
– http://tech.mit.edu/V132/N34/education.html (free high quality courses!!!)
– https://students.cis.uab.edu/wiki/index.php/Main_Page (survival guide
with many good pointers)
– http://www.topuniversities.com/student-survival/student-life/gettingthrough-your-first-year-uni (student survival guide)
– http://csugsac.eng.utah.edu/survival_guide/professors.html (student
survival guide)
– https://sites.google.com/site/princetoncsmajors/jobs/interviewing
(interviewing)
– http://oedb.org/fast-track-careers-computer-science (careers in CS)
– http://emmaus.patch.com/articles/five-things-to-know-about-the-future-ofcomputer-science (CS future)
– http://www.cs.umd.edu/~oleary/gradstudy/gradstudy.pdf (graduate school)
Parting thoughts
• Please fill out the class evaluations (e-SFF
evaluations)
• Last homework: exit survey
=> You need to fill both to get the grade in this course!!