슬라이드 1 - CBS

Download Report

Transcript 슬라이드 1 - CBS

Directors General of the National Statistical Institutes Meeting
25~27 September 2013/Hague, Netherlands
Big Data .vs. Official Statistics
Yu gyung Kang
Director, Statistical Information Portal Division
Statistics Korea
Contents
Technology Assessment (TA) in Korea
Big Data Use in Private Sector
• Market Analysis
• Suicide Warning System
On-going Projects by KOSTAT
• Pilot Project for Mining and Manufacture Survey
• E-household Account System
• Pilot Project for Price Statistics
Future Challenges
1
Technology Assessment (1)
…Conducted by MSIP of Korea in 2012, under the Article 14
of the Framework Act on Science and Technology
• What is big data?
– Data with 3Vs characteristics + Data Management Technology
* Gartner’s 3Vs : Volume, Variety and Velocity
Volume
GB/TB
Variety
Structured Data
Unstructured Data
2
Low speed
(hours to
weeks)
Video
PB
EB
ZB
Velocity
Customer Data
Sale Data
Stock Data
Finance Data
Music
Messages
High speed
(mins. to
seconds)
SNS
GPS
BBS
…….
Technology Assessment (2)
• Expected Impact
Private Sector
•
•
•
•
•
•
3
source of new value
creation
Supporting efficient
decision-making
Providing business
chances and jobs
Public Sector
•
•
•
Aggravating economic •
inequality
Possibility of wasting •
money due to careless •
massive investment
Social problems
caused by unethical
use of data
Improving public
service and its
efficiency
Real-time response to
social issues
Creating new industry
and job opportunities
Individuals
•
•
Increasing risk of
•
leaking gov’t’s secrets
‘Big Brother’
Misuse of big data with
error and its negative
impact to gov’t policies
Improving quality of
life with individually
tailored service
Increasing trust in
public policies and
service
Increase of privacy
and security issues
Technology Assessment (3)
• Policy Recommendations
a. Localize Core Technologies related to big data through gov’tled R&D
b. Establish Legal and Institutional Basis for standardization of
managing, sharing and trading big data
c. Foster pool of Big Data Analysts and Experts through
interdisciplinary undergraduate and graduate programs
d. Take a Step-By-Step Approach by Setting Priorities in the
sectors where benefits to the public will be visible.
e. Make Strategies to Protect Privacy
4
Big Data Use in Private Sector
Case 1 : Market Analysis by
Which Business would you like to open?
5
Big Data Use in Private Sector
Case 1 : Market Analysis by
Real Estate 411
Real Estate
…
Sales Information
Consumer
Type
Korean Statistical Information
Service
6
Floating Population
Business
Cycle
Big Data Use in Private Sector
Case 2 : Suicide Warning System
Why not
Suicide
forecast?
Weather
Forecast
•
•
•
•
social factors
weather factors
Werther Effect
personal emotion
OECD (2012), OECD Health Statistics
7
Big Data Use in Private Sector
Case 2 : Suicide Warning System
• Training Set (2008-2009) & Test Set (2010)
– Total number of suicide incidents
– Economic and weather data
• CPI, unemployment rate, KOSPI(Korean Composite Stock Price Index),
daylight hours and temperature
– 150 million posts from about 5 million blogs on NAVER(incl. SNS posts)
• Var1 (# of posts including “suicide”),
• Var2 (# of posts including “dysphoria”, “be tired”, “be painful”, or “be
exhausted”)
• Model
– Dependent Variable : No. of suicide in a given period(3 days)
– Independent Variables
8
•
•
•
•
CPI, unemployment rate, KOSPI, daylight hours, temperature
Two variables obtained from the Posts
Celebrity suicide (control variable)
No. of suicide from the previous period
What should NSOs do?
Challenge!
9
scientifically collected data
huge amount of data
Established theoretical basis
Quantity beats quality
Representativeness of target
population
Lack of representativeness of
target population
Relatively slow
MORE TIMELY
Expensive data collection
Data already there
KOSTAT tried…
10
October 2012~March 2013
December 2012~April 2013
Organizes seminars once or twice a
month inviting outside big data
experts
A pilot project on the use of big
data in the process of editing
existing national statistics
Aims to raise awareness of big data
and its impact on producing official
statistics
Using media data for examining
outliers when producing the Index
of Industrial Production(IIP)
KOSTAT is doing…
1. E-Diary System(household Account System)
•
Currently about 48.5% of sample household
adopted the e-Diary system
•
Respondents can import their
expenditure information
through online transactions
from the banks, credit card
companies and major retail
stores.
using big data for the
convenience of
respondents
11
KOSTAT is doing…
2. Pilot Project of Price Index
Please select specific
domains(or items) that can
clearly show difference
between big data and
existing statistics
Prof. Roberto Rigobon
i.e. TV or electronic products
KOSTAT is currently preparing for a pilot
project on compiling price index using big
data for a specific manufacturing product.
12
Future Challenges
Can we ignore Big data just because of its representativeness
issue in spite of its strengths like timeliness?
Can KOSTAT disallow over 380 statistical agencies to produce
official statistics with big data?
Maybe Not!
Shall make use of big data in producing statistics at some point in the
future as it was the case with transition to administrative data from
survey data.
Need to identify the limitations of big data through pilot projects and
learn techniques and know how to refine big data based statistics for
official statistics.
13
감사합니다!
Thank you very much!