Big data in the Philippine context

Download Report

Transcript Big data in the Philippine context

Big Data and Official Statistics:
Philippine Context
Erniel B. Barrios
• Concepts and Definitions
• Coverage of Big Data
• Big Data and Official Statistics: Preliminary Framework
• Current Practices (Some Models)
• Possible Big Data in the Philippines
• Next Steps
Frequency of Documents Containing Big Data in ProQuest
Research Library
Basis of Definitions
• Stakeholders may define Big Data differently
• Data storage and data analysis
• Intertwined technical and socio-technical issues
• Multiple, ambiguous and often contradictory definitions
• “Big” => significance, complexity, challenge
• Five V’s
Volume (size)
Velocity (rate of production)
Variety (format, representations)
IBM:V is Veracity (trust and uncertainty)
SAS: Variability (complexity).
• Intel: generating a median of 300 TB
Basis of Definitions
• Size: volume of the dataset
• Complexity: structure, behavior and permutations of the dataset
• Technologies: tools and techniques which are used to process a
sizable or complex dataset
• Appropriate description, integration, and sustainability of very large
datasets generated by high throughput experiments
• Large collection of small disparate, unstructured datasets, (taken
together, can be analyzed to find unusual trends).
• Emergence of digital enterprise, ability for an organization to take full
advantage of its digital assets, collectively large amount of data
• Oracle: Inclusion of additional data sources to augment current
• Microsoft: process of applying serious computing power (machine
learning, AI) to seriously massive and often highly complex sets of
• Big Data describes the storage and analysis of large and/or complex data
sets using a series of techniques.
• High-volume, high-velocity, and high-variety information assets that
demand cost-effective, innovative forms of information processing for
enhanced insight and decision-making.
• Describes large volumes of high velocity, complex and variable data that
require advanced techniques and technologies to enable the capture,
storage, distribution, management, and analysis of information.
• UNECE: Big Data-data that is difficult to collect, store of process within
the conventional systems of statistical organizations. Either their
volume, velocity, structure or variety
Online Survey of 154 Global Executives (April 2012)
• Big Data
Not only in size (though volume can be part of it)
Varying Sources, Several Variables (Indicators)
Differing data collection methods (compilation)
Frequency (possibly irregular)
• Issues
Data extraction
Data Mining
New Data Sources
• Consumer Usage Database
• Blogs
• Social Media
• Sensor Networks
• Image Data
• May vary in
• Size
• Structure
• format
Coverage of Big Data
• Basic research data
• Electronical health records
• Consumer Usage Database
• Proposals submitted
• Administrative data
• Censuses and Surveys
Types of Big Data (Classification)
• Social Network: Human-sourced information
• Social networks, Blogs, Personal Documents, Pictures, Videos, Internet
Searches, Mobile Data, User-generated maps, E-mail
• Traditional Business Systems: Process-mediated data
• Public agencies (including medical records), produced by business
(commercial transactions, banking/stocks records, E-commerce, Credit Cards)
• Internet: machine-generated
• Fixed sensors: home automation, weather/pollution sensor, traffic, scientific,
• Mobile sensors: mobile phone, cars, satellite images
• Computer systems: logs, web logs
Current Practices
• Analysis of Traffic Loop Detection Data (Statistics Netherlands)
• Traffic loop detection data: measurements of traffic intensity
• Create maps that indicate the number of vehicles for each measurement
location for each time point by means of color coding.
• Number of vehicles in various length categories
• Predictive modeling need to be developed⇒ estimated aggregates and
variance estimates reflecting the uncertainty of the estimation procedure.
• Analysis of Social Media Messages (Statistics Netherlands)
• 70% of Dutch population actively posts messages on social media.
• Sentiment = Consumer Confidence
Big Data and Official Statistics
• Location data for mobile phones
• used for instantaneous daytime population and tourism statistics
• proxy indicators for demand
• Social media messages
• Process into early indicators of consumer confidence
• Price information on the web, from loyalty cards
• Inflation level
• Google search
• Prevalence rate of Influenza
• Tweets
• Stock market prices
Big Data and Official Statistics: Preliminary Framework
Administrative Reports
Collaboration (PPP)
Human Resources
Official Statistics, SDG
Possible Big Data in the Philippines
• Censuses
• Survey
• Administrative Reports
• Regulation, Licensing and Compliance
• Monitoring (e.g., MFO, Budgeting, Intervention (4Ps, RSBSA, etc.)
• Registers (BIR, COMELEC, UMID, GSIS/SSS, Philhealth, Pag-Ibig, etc.)
• Private/Commercial
Credit cards
Loyalty Cards
Social Media, Google, etc.
Next Steps
• What is available?
Big data sources
Data that can shared, frequency, timeliness
Data security, confidentiality issues
Big Data and Official Statistics: Is it feasible?, Is it worthy?
• What is needed for collaboration, data-sharing?
Thank you.