Big Data & 21st Century Analytics in

Download Report

Transcript Big Data & 21st Century Analytics in

Thinking Big in Small Spaces
One Hadoop
Two Hadoop
(Big Data & 21st Century Analytics in the Classroom)
Nathan Kohn
Stanislav Seltser
BU MET
BU MET
[email protected]
[email protected]
Big Data is Everywhere
6 Billion
Flickr Photos
28 Million
Wikipedia Pages
900 Million
Facebook Users
72 Hours a Minute
YouTube
“…growing at 50 percent a year…”
“… data a new class of economic asset,
like currency or gold.”
Mar 7, 2014
2
2
Big Learning
How will we
design and implement
Big learning systems?
GPUs
Mar 7, 2014
Multicore
Clusters
Clouds
Supercomputers
3
3
Graphs are Everywhere
Collaborative Filtering
User
Social Network
Netflix
Movie
Probabilistic Analysis
Docs
Text Analysis
Wiki
Words
Mar 7, 2014
4
4
Big Data & Linear Regression
Mar 7, 2014
6
Stochastic Gradient Descent
Mar 7, 2014
7
Serial vs Parallel SGD
Mar 7, 2014
8
Big Data Landscape –Apps,
Infrastructure, Data Semantics
Mar 7, 2014
9
Landscape
Mar 7, 2014
10
Grad Student Response #1
How Big is Big? How is BigData measured?
As per my understanding, the term big data doesn’t refer directly to the size of the
data itself. What the term might mean is that the demand of data
(storage/transfer/analysis) has surpassed several parameters that the relational
databases cannot control (or handle) –too big to handle--.
How is it measure, I really don’t know. Server storage keeps increasing and
increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be
keeping up with it, but then again I don’t know exactly what measure is being used.
Is Big Data relevant to you professionally?
Indeed it is, even though I am not using it or practicing it daily.
I am really interested in learning it.
Is Big Data relevant to you personally?
Very relevant, and it is a topic that drove me into pursuing a master’s degree
Mar 7, 2014
11
Grad Student Response #2
How Big is Big? How is BigData measured?
Big data is a term for large data sets that are too complex to compute by traditional
data management processes and tools. Its points and data types are dependent and
measured by the parameters set forth by each organization.
Where does BigData come from?
Big data can come from various sources that can be categorized as internal or external
contributors.
What is BigData good for?
BigData is good for complex and large data sets that exist within a relational databases
and may require object-oriented programming.
Would you like to see Big Data incorporated in your courses?
Yes, I think that we exist in a period in which we are inundated by social media,
numbers, photographs and other forms of data which require us to be well versed in
the storage, maintenance, and interface design so that we are better able to parse
through the Big Data that we encounter on a daily basis.
Mar 7, 2014
12
Undergrad Student #1
Is Big Data relevant to you personally?
Yes. As my current major is Business Application Development, I can see myself
gaining a lot of opportunities to deal with not only the technologies of building up
user interface in the future but also the technologies of storing user information,
and the techniques used to understand those data could be another opportunity for
the business
Would you like to see Big Data incorporated in your courses?
Yes. I would like to see our course includes some of the techniques that the
corporates use nowadays to understand the relation between their data and the
problems they need to address, such as how they decide which part of the their big
data provides them with the most helpful information for their problem, and explain
the meaning of their data analysis based on the result, such as how they can decide
the result is accurate and meaningful enough to allow them to take an action.
Do you have any questions about Big Data?
Big data is a pretty interesting and useful topic. It will be nice to have more
background information to help our understanding.
Mar 7, 2014
13
Undergrad Student #2
How Big is Big? How is BigData measured?
The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at
that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper
languages) data on magnitudes that were impossible before. Instead of just a phone book type of data,
people can gather every relevant or even possibly relevant piece of information about anything (often
but not limited to customers of a business). I have read articles about how some companies (credit card
mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves.
Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and
deviations from those habits.
While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big
data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the
search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes
aren't being entered by hand).
Classes or lectures on big data should come away with some practical knowledge on the subject,
otherwise we're just applying a name to something people generally understand: organizations collect
and analyze as much data as they can, and recent technology has made that amount of data
staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is
generally more important.
Mar 7, 2014
14
Student Response #4
How Big is Big? How is BigData measured?
Big data is a term developed recently to describe the trend of exponentially
increasing amount of data stored by organizations for business uses. Very often
these big data might be extremely big, such as 16 petabytes. These data is measured
by the memory space they occupy. Thus, a 16 petabytes of big data approximately
occupies 1015 bytes of memory.
Where does BigData come from?
Big Data could come from different sources, such as emails, social-networking sites,
sensors on the webs, sensors installed on other tracking devices, or line of business
applications.
Is Big Data relevant to you professionally?
Yes. In my previous work as market researcher, we always needed to gather
information and analyzed them for the business decision making. The technologies
of gathering big data and the techniques used to analyze and filter data is also
considered extremely helpful for the career.
Mar 7, 2014
15
Data Warehouse Course
Student Comments:
Very informative, content-rich course, covers the latest technologies, trends, and
skills of data warehousing and data management, and data analysis. I would
recommend to include this course in the required courses for the MS in CIS
with concentration in Database Management and BI Program.
Relevance to job opportunities and cutting edge technologies.
This is probably the most useful course I have taken at Boston University. I have
used every bit of what this professor taught every night at work. I have made
contribution to my employer, a data mining company in ways that had
never been done before as a result of this course. I have for the first time in
my 8 years career planned, designed, and augmented a Data Warehouse from
scratch. I have configured an analysis server and reported using MD x queries.
This professor has been helpful in many ways. He has guided me through
some Data Warehouse design projects at work. Moreover, he has been
available to work with me and others after class and on week days.
Mar 7, 2014
16
Road map
Mar 7, 2014
17
A
Archeology
to help archaeologists find answers to questions hidden in
thousands of images and text files generated from field
sites around the world:
Professor Mark Eramian et al. have been awarded
$548,000 through the Digging into Data
Challenge, National Endowment for the Humanities
B
Biology
Recently, a researcher wanted to ascertain whether a
search against GQ-Pat could provide novel insight into
his work related to a specific gene, the cAMP
Responsive Element Modulator.
Reporting to the VP of R&D:
Apply data mining and machine learning techniques to
develop better search and content discovery in the field of
patents Invent new ways to index tens of millions of
documents with semantic information
Z
Zymurgy
(hint: beer)
QUIZ ?
Quiz:
Stanislav Seltser
BU MET
[email protected]
Nathan Kohn
BU MET
[email protected]