Cloud computing for big data in education

Download Report

Transcript Cloud computing for big data in education

BIG DATA IN EDUCATION
臺南市政府教育局
資訊中心主任:高誌健
•
•
•
•
•
Technology trends
Data analytics Trends
Cloud computing for big
data in education
Big Data Analytics in the
Cloud
Educational Practices
TECHNOLOGY TRENDS
TOP TECHNOLOGY TRENDS FOR 2014
1.
Emergence of the Mobile
Cloud
Mobile distributed computing paradigm will lead to
explosion of new services.
2.
From Internet of Things to
Web of Things
Need connectivity, internetworking to link physical and
digital.
3.
From Big Data to Extreme
Data
Simpler analytics tools needed to leverage the data
deluge.
4.
The Revolution Will Be 3D
New tools, techniques bring 3D printing power to masses.
5.
Supporting New Learning
Styles
Online courses demand seamless, ubiquitous approach.
6. Next-generation mobile
networks
Mobile infrastructure must catch up with user needs.
7. Balancing Identity and
Privacy
Growing risks and concerns about social networks.
8. Smart and Connected
Healthcare
Intelligent systems, assistive devices will improve
health.
9. E-Government
Interoperability a big challenge to delivering
information.
10. Scientific Cloud
Computing
Key to solving grand challenges, pursuing
breakthroughs.
http://www.computer.org/portal/web/membership/Top-10-Tech-Trends-in-2014
BIG DATA: WHY NOW?

90% of the data in the world was created in the last 2 years.

The average person today processes more data in a day than a person in the 1500’s entire lifetime.

The LAPD is piloting a big data scheme to predict crime. An algorithm predicts where crime is likely to take place
giving police teams in foothill LA the scheme.12% decrease in property crime, 26% decrease in burglary.
Predictive policing is now being rolled out in 150 cities across America.

The algorithm was initially developed to predict earthquakes, 43% of data gathered on people comes from social
media.

Twitter 100,000 tweets every minute, 650,000 shares on Facebook every minute, 144,000,000 Tweets and
936,000,000 Facebook shares every day.

NETFLIX records 30 million users ‘plays’ a day, it analyses when users pause…, rewind… fast-forward…,
and search… , it also knows what users like…

But we’re just getting started. Augmented reality, the quantified self, the internet of things will all become
ubiquitous.

Data production will be 44X greater in 2020 than it was in 2009.

Every day, the data mountain grows by 2.5 billion gigabytes.

In 2013, all human knowledge is estimated to be 12 exabytes.

1 exabyte *1000 = 1 zetabyte = a hard drive

Information is the oil of the 21st century, and analytics is the combustion engine.
-Perter Sondergaard, senior vice president at Gartner
https://www.youtube.com/watch?v=2D8oji5EKbM
BIG DATA CHARACTERISTICS
Volume
Velocity
Variety
Unfathomable.
Record-breaking. Vast.
Untapped.
2.7 trillion
gigabytes of data
was created or
replicated in
2012.
Every day 2.5
quintillion bytes
of data are
created and the
total amount of
data doubles
every two years.
Analytics has
the potential to
unlock
productivity
growth and
innovation.
In 2011 there
were 9 billion
connected
devices, and that
is expected to
grow to 24
billion in 2020.
Value
http://www.fico.com/en/Communities/Pages/BigData.aspx &
https://www.youtube.com/watch?v=7D1CQ_LOizA
TOP 10 MOST FUNDED BIG DATA STARTUPS
Last update: March 31, 2014
Company
Funding
(million)
Business
Cloudera
1,040
Hadoop-based software,
services and training
Palantir
650
Analytics applications
Domo
250
Business intelligence platform
MongoDB
231
Document-oriented database
Mu Sigma
208
Data-Science-as-a-Service
Hortonworks
198
Hadoop-based software,
services and training
Opera Solutions
114
Data-Science-as-a-Service
Talend
102
Application and business
process integration platform
Guavus
89
Big data analytics solution
DataStax
83.7
Cassandra-based big data
platform
http://www.forbes.com/sites/gilpress/2013/10/30/top-10-most-funded-big-data-startups-updated/
DATA ANALYTICS TRENDS
OPPORTUNITY IN TYPES OF BIG DATA






Sentiment: understand how your students feel about your
teaching and feedbacks-right now
Clickstream: capture and analyze website visitors’ data trails
and optimize your website
Sensor/machine: discover patterns in data streaming
automatically from remote sensors and machines
Geographic: analyze location-based data to manage
operations where they occur
Server logs: research logs to diagnose process failures and
prevent security branches
Unstructured (text, video, pictures, etc…): understand
patterns in files across millions of web pages, emails, and
documents
DATA ANALYTICS CHALLENGES

Data capture at the user interaction level:
in contrast to the client transaction level in the
Enterprise context
 Summative to formative analysis


As a consequence the amount of data increases
significantly
Greater need to analyze such data to understand
user behaviors
EDBT 2011 Tutorial

CUSTOMER (CONSUMER) ANALYTICS
Propensity and Best Next Action
 Sentiment analysis
https://www.youtube.com/watch?v=Ga2jMY5nzzY
&feature=player_embedded
 Behavior scoring models

http://www.statsoft.com/Solutions/Cross-Industry/Customer-Analytics
CLOUD COMPUTING FOR BIG
DATA IN EDUCATION
PARADIGM SHIFT IN COMPUTING
EDBT 2011 Tutorial
THE NIST DEFINITION OF CLOUD
COMPUTING

Essential Characteristics:






Service Models:




On-demand self-service.
Broad network access.
Resource pooling.
Rapid elasticity.
Measured service.
Software as a Service (SaaS).
Platform as a Service (PaaS).
Infrastructure as a Service (IaaS).
Deployment Models:




Private cloud.
Community cloud.
Public cloud.
Hybrid cloud.
http://www.nist.gov/itl/cloud/
Cloud computing is a model for enabling
convenient, on-demand network access to a
shared pool of configurable computing
resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly
provisioned and released with minimal
management effort or service provider
interaction. This cloud model promotes
availability and is composed of five essential
characteristics (On-demand self-service, Broad
network access, Resource pooling, Rapid
elasticity, Measured Service); three service
models (Cloud Software as a Service (SaaS),
Cloud Platform as a Service (PaaS), Cloud
Infrastructure as a Service (IaaS)); and, four
deployment models (Private cloud, Community
cloud, Public cloud, Hybrid cloud). Key
enabling technologies include: (1) fast widearea networks, (2) powerful, inexpensive server
computers, and (3) high-performance
virtualization for commodity hardware.
CLOUD COMPUTING: WHY NOW?

Experience with very large datacenters


Technology factors
Pervasive broadband Internet
 Maturity in Virtualization Technology


Business factors
Minimal capital expenditure
 Pay-as-you-go billing model

EDBT 2011 Tutorial

Unprecedented economies of scale
Transfer of risk
ECONOMICS OF CLOUD USERS
Demand
Resources
Resources
Capacity
EDBT 2011 Tutorial
• Pay by use instead of provisioning for
peak
Capacity
Demand
Time
Static data center
Time
Data center in the cloud
Unused resources
Slide Credits: Berkeley RAD Lab
ECONOMICS OF CLOUD USERS
Demand
2
1
Time (days)
Capacity
3
Lost revenue
Demand
3
Resources
2
1
Time (days)
Capacity
EDBT 2011 Tutorial
Resources
Resources
• Heavy penalty for under-provisioning
Capacity
Demand
2
1
Time (days)
3
Lost users
Slide Credits: Berkeley RAD Lab
CLOUD COMPUTING MODALITIES
EDBT 2011 Tutorial
“Can we outsource our IT software and
hardware infrastructure?”



Hosted Applications and services
Pay-as-you-go model
Scalability, fault-tolerance,
elasticity, and self-manageability
“We have terabytes of click-stream data –
what can we do with it?”



Very large data repositories
Complex analysis
Distributed and parallel data
processing
BIG DATA ANALYTICS IN THE
CLOUD
CHALLENGES

Scalability to large data volumes:
Scan 100 TB on 1 node @ 50 MB/sec = 23 days
 Scan on 1000-node cluster = 33 minutes


Cost-efficiency:
Commodity nodes (cheap, but unreliable)
 Commodity network
 Automatic fault-tolerance (fewer administrators)
 Easy to use (fewer programmers)

EDBT 2011 Tutorial
 Divide-And-Conquer (i.e., data partitioning)
PLATFORMS FOR BIG DATA ANALYSIS

Parallel DBMS technologies
Proposed in the late eighties
 Matured over the last two decades
 Multi-billion dollar industry: Proprietary DBMS
Engines intended as Data Warehousing solutions for
very large enterprises

Map Reduce
pioneered by Google
 popularized by Yahoo! (Hadoop)

EDBT 2011 Tutorial

DATA ARCHITECTURE EXAMPLE 1
http://hortonworks.com/hadoop-modern-data-architecture/
DATA ARCHITECTURE EXAMPLE 2
ENTERPRISE PREDICTIVE ANALYTICS
PLATFORMS









FICO
http://www.fico.com/
IBM SPSS
http://www-01.ibm.com/software/analytics/spss/
KXEN
http://www.kxen.com/
Oracle Advanced Analytics
http://www.oracle.com/us/products/database/options/advanced-analytics/overview/index.html
Revolution Analytics
http://www.revolutionanalytics.com/
Salford Systems
http://www.salford-systems.com/
SAP
https://www54.sap.com/pc/analytics/business-intelligence/software/predictiveanalysis/index.html
SAS
http://www.sas.com/
Statsoft
http://www.statsoft.com/
EXCEL DATA MINING ADD-INS
11Ants Model Builder
http://www.11antsanalytics.com/
 Alyuda ForecasterXL
http://www.alyuda.com/forecasting-excelsoftware-with-neural-network.htm
 DataMinerXL
http://www.dataminerxl.com/
 Predixion Enterprise Insight
http://www.predixionsoftware.com/predixion/
 XLMiner
http://www.solver.com/xlminer-data-mining

OPEN SOURCE AND FREE DATA MINING
TOOLS
Knime
http://www.knime.org/
R
http://www.r-project.org/
 Orange
http://orange.biolab.si/
 Rapid Miner
http://rapid-i.com/
 WEKA
http://www.cs.waikato.ac.nz/~ml/
https://weka.waikato.ac.nz/ (Course)

http://www.youtube.co
m/watch?v=wCvnO96
d8h4
LEARNING R
中華R軟體學會
https://sites.google.com/site/zhonghuarru
antixuehui/home
 Introducing R
http://data.princeton.edu/R/default.html
 Try R
http://tryr.codeschool.com/levels/1/challe
nges/2
 Data mining with R
http://www.dcc.fc.up.pt/~ltorgo/DataMini
ngWithR/
 UCLA idre
http://www.ats.ucla.edu/stat/r/

4 MACHINE LEARNING STARTUPS
Alpine Data Labs
http://www.alpinedatalabs.com/
 BigML
https://bigml.com/
 SkyTree
http://www.skytree.net/
 Wise.io
http://about.wise.io/

EDUCATIONAL PRACTICES
BIG DATA IN EDUCATION
CLOUD COMPUTING IN EDUCATIONAL
PRACTICES

Two issues:


Educational resources and necessary applications
Examples:
Providing lower level cloud services (such as data storage)
 Open educational resources were produced, researched,
collected, and shared.
 Hosting learning management systems (LMSs) in the cloud.
 Providing individual bundled applications in the cloud. (e.g.
Google Apps for education or Microsoft Live@edu with
office 365) that combine tools for communication and
collaboration, office tools for working with documents, and
space to store and synchronize data on demand.

CLOUD SERVICE NEEDS AND USES
Cloud Computing in Education and Student's Needs by E. Krelja
Kurelović, S. Rako, and J. Tomljanović

About cloud service & computing in Tainan
150000 teacher&student single-sign-on->completed
 Iaas & paas & saas ->completed
 All over 168 application & data(resource)

THANKS FOR YOUR ATTENTION