PPT - United Nations Statistics Division

Download Report

Transcript PPT - United Nations Statistics Division

Big Data in the National
Accounts
Experience in the United States
Brent Moulton
Advisory Expert Group on National Accounts
Washington, DC
9 September 2014
www.bea.gov
What are big data?
▪ Wikipedia: “Any collection of data sets so large and
complex that it becomes difficult to process using…
traditional data processing applications.”
▪ IBM: “Every day we create 2.5 quintillion bytes of
data… This data comes from everywhere… This is big
data.”
▪ Forbes: “12 big data definitions: what’s yours?”
 # 11 – “The belief that the more data you have, the more
insights and answers will arise automatically from the pool”
 # 12 – “A new attitude… that combining data from multiple
sources could lead to better decisions.”
www.bea.gov
2
Big data and official statistics
▪ Statistical agencies as producers of big data




Consistency in format and presentation
Catalogued in common, machine-readable format
Accessible in bulk
Desirable to make government data available on a single
platform
▪ Big data as source data for national accounts
 Administrative data, especially micro-data
 Data from private sources
 Web scraping
www.bea.gov
3
Concerns about using big data
▪ Do the concepts match those needed for
national accounts?
▪ How representative are the data?
 Selection biases
▪ Is it possible to fill the gaps in coverage?
▪ Do the data provide consistent time series and
classifications?
▪ How timely are the data?
▪ How cost effective?
www.bea.gov
4
Defined-benefit pension funds
▪ For the SNA’s new treatment of definedbenefit pensions, BEA found it useful to
work with administrative micro-data
filed by pension funds
 “Form 5500” data from Pension Benefit
Guaranty Corporation
 ~ 45,000 records per year covering 98% of
private pension funds
 BEA had to edit data to remove data errors
and anomalies
www.bea.gov
5
Private source data for early estimates
▪ For “advance” GDP estimate (release about 30
days after the end of the quarter), official
monthly/quarterly indicators are not always
available
▪ Examples of private source data used by BEA:






www.bea.gov
Ward’s/JD Powers/Polk (auto sales/price/registrations)
American Petroleum Institute (oil drilling)
Air Transport Association of America (airlines)
Variety magazine (motion picture admissions)
Smith Travel Research (hotels and motels)
Investment Company Institute (mutual fund sales)
6
Health care satellite account
▪ Schultze Commission (At What Price? 2002)
recommended that health care price indexes
should be based on cost of treating a specific
diagnosis
▪ BEA is preparing a health care satellite care
(http://www.bea.gov/national/health_care_satellite_account.htm)
 One approach uses insurance claims data for several
million insured individuals
 Claims grouped in disease episodes
 Allows comparison of change in cost for treating
particular diseases
www.bea.gov
7
Local area tracking system
▪ Used by BEA’s regional accounts staff for
independent data on regional economies
▪ Used to vet official statistics before publishing
▪ Types of data
 Employment data: largest employers, principal
industries, recent layoffs
 Natural events affecting the economy
 Local real estate and financial trends
▪ Automated using web scraping methods
 Identifying key word searches
 Archiving relevant articles
www.bea.gov
8
BEA research on depreciation
▪ Identifying depreciation in the presence of
obsolescence is a long-standing issue
▪ BEA research on motor vehicle depreciation
proposes to address this problem using data
on “build dates,” which can differ from model
years
▪ Data scraping – VIN-level data from
decodethis.com combined with auction data
from NADA and data from other auto websites
▪ Goal is improved estimates of depreciation
www.bea.gov
9
Conclusions
▪ Big data will become increasingly important
▪ Priority to improving data quality, filling gaps,
and keeping up with changing economy
▪ Big data especially useful for research projects
▪ Big data may allow for more timely or higher
frequency estimates
▪ Attention must continue to be paid to
traditional data quality issues
www.bea.gov
10