Providing Help with Statistical Concepts and Terms
Download
Report
Transcript Providing Help with Statistical Concepts and Terms
Providing Help with Statistical
Concepts and Terms:
Enhanced Glossary and
Ontology
Stephanie W. Haas
Ron Brown
Cristina Pattuelli
Development of Enhanced Glossary
ontology
term
content
context
specificity
presentations
format
user
control
Terms
• Include terms that users frequently encounter on
agency sites, not comprehensive dictionary
• Basic level of statistical literacy, not highly
technical resource
• Strategies for term identification
– examination of frequently-visited pages
– anecdotal evidence from agency and non-agency
consultants
– metadata user study
– webcrawl of agency sites
Content
• Provide basic level of explanation
• May include:
–
–
–
–
–
–
definition
example
brief tutorial
demonstration
interactive simulation
combination
• May incorporate related terms and concepts
• Give pointers to more complete and/or more
technical explanations
Context specificity
• Explanations provided at varying levels of
specificity
– General, context-free, “universal”
– Agency or concept-specific, incorporating
entities from agriculture, labor, science R&D,
energy, etc.
– Table- or statistic-specific, based on a single
row, column, or statistic, e.g., CPI, national
death rate, gasoline prices in NY state, etc.
• Provide explanations of term or concept that are
as relevant to user’s current context as possible.
• When user invokes help on a term, the most
specific explanations available are offered.
• If there is no explanation for that specific statistic
or table, more general (e.g., agency-specific)
ones are offered. Default is “universal” level.
• Path from specific to general is based on the
ontology.
Format
• User can choose desired format of
explanation, based on interest, learning
style, reading level, hardware/software
limitations
– text
– text plus audio (narration)
– graphic
– animation
– interactive
User Control
• Make glossary help attractive and accessible
• Help users understand the statistics they find
without interrupting their information-seeking
task
• Let users know when help is available
• Let users choose the format and specificity they
desire
• Control mechanisms, e.g., means of invocation
and termination, pop-up windows, mouse-overs,
etc.
Creating the Ontology
• Select ontology editor to meet our needs
• Include terms and concepts to support
glossary.
– May need “connecting nodes” that aren’t in
glossary
• Relationships
– standard – isa, instance, etc.
– domain-specific – predicts, smoothes, etc.
• Visualization tools for end users (future
work)
Ontology support for glossary
Relationships support design and display of
term explanations
• Specificity of explanations
– inheritance of more general explanations
• Explanation templates
– sample: samples for specific surveys
– index: CPI, Antiknock Index
• Related terms – incorporation into tutorial
– population, sample
Current Coverage
• adjustment
– universal
– age adjustment - FL death rates
– seasonal adjustment - NY unemployment rate
• index
– universal, CPI, Antiknock index
• population, parameter, sample, statistic
– universal, weekly gasoline prices, NY state
weekly gasoline prices, height & weight of
U.S. adult residents
Mock-ups
population & sample (1)
Population
Dislikes dogs
Likes dogs
p = 10/50 = .2 = 20%
Suppose this picture represents the population of people in the entire country.
In this population, a certain percentage (p) of people like dogs. In this example,
10 people like dogs. P is the parameter that measures this view of the population.
It is the value that you would get if you could survey the entire population.
20% of the people in this population like dogs.
population & sample (2)
Population
Dislikes dogs
Likes dogs
Sample
p = 10/50 = .2 = 20%
P* = 3/10 = .3 = 30%
In real life it is difficult to survey the entire population so we take a sample.
We can then count the number of people in the sample who like dogs,
and calculate a statistic (P*) that is an estimate of the value of p.
In this case, P* overestimates the value of the parameter p.
EIA weekly gasoline prices
Every Monday, retail prices for all three grades of gasoline are
collected by telephone from a sample of approximately 900 retail
gasoline outlets.
Reported in:
Weekly U.S. Retail Gasoline Prices, Regular Grade
Dollars per gallon, including all taxes
http://www.eia.doe.gov/oil_gas/petroleum/data_publications/wrgp/mogas_home_page.html
•text example of population and sample for this table
•graphical example of population and sample for this table
graphical example of population & sample, gasoline prices
population:
all retail gasoline outlets
sample:
900 retail gasoline outlets
regular gasoline,
mean price/gallon,
9/30/02 = $1.413
•text example of sample for NY
•graphical example of sample for NY
graphical example of
sample, NY gasoline
prices
9/30/02
sample of New York
retail gasoline outlets
$$
mean cost = $1.529 per gallon
•graphical example of population and sample for body measurements
graphical example of population and sample for
body measurements
5,000 individuals are
surveyed annually
each participant represents
approximately 50,000 other
U.S. residents
is_described_by
Population
Parameter
Is part of
mean
standard_deviation
is_described_by
Sample
Statistic
sample_mean
sample_standard_deviation
is_described_by
Population
Parameter
Is part of
mean
standard_deviation
is_described_by
Sample
Statistic
sample_mean
sample_standard_deviation
U.S. residents
Population
NY State retail gasoline outlets
U.S. retail gasoline outlets
Is part of
U.S. R&D companies
Sample
n U.S. R&D companies
900 U.S. retail gasoline outlets
n NY State retail gasoline outlets
instance of
5,000 U.S. residents/yr
Index
6
An index combines
24.7
numbers measuring
59
103
different things into a
42
10.1
single number. The
single number
represents all the
combiner
different measures in a
compact, easy-to-use
form. Values for an
index can be
compared to each
index = 12.3
other, for example,
over time.
Jan.
combiner
Apr.
combiner
Jul.
combiner
Oct.
combiner
12.3
13.1
13.9
14.3
14.5
14
13.5
13
The index has
increased this year.
12.5
12
Jan
Apr
Jul
Oct
Consumer Price Index (CPI)
The Consumer Price Index (CPI) represents changes in
prices of all goods and services produced for consumption by
urban households. It combines prices into a single number
that can be compared over time.
Items are classified into 8 major groups:
•Food and Beverages
•Housing
•Apparel
•Transportation
•Medical Care
•Recreation
•Education and Communication
•Other
food & beverage
education &
communication
transportation
Telephone
other
recreation
housing
medical care
apparel
CPI combiner
Consumer Price Index
1997 CPI
Combiner
1998 CPI
Combiner
1997
1998
1999 CPI
Combiner
2000 CPI
Combiner
2001 CPI
Combiner
180
175
170
165
160
1999
2000
2001
The Consumer Price Index has increased since 1995.
Antiknock Index, also known as
Octane Rating
A number used to indicate gasoline’s antiknock
performance in motor vehicle engines. The two
recognized laboratory engine test methods for
determining the antiknock rating, i.e., octane
rating, of gasolines are the Research method and
the Motor method. In the United States, to provide
a single number as guidance to the consumer, the
antiknock index (R+M)/2, which is the average of
the Research and Motor octane numbers, was
developed.
http://www.eia.doe.gov/glossary/glossary_a.htm
Research
method
Motor
method
Antiknock
Combiner
(R + M)/2
Antiknock Index, also known as Octane Rating
Regular:
Midrange:
Premium:
85 - 88
88 - 90
90 or above
Next Steps
• expand coverage of core terms
– webcrawl indicates measures of central
tendency are next: average, mean, median,
mode
• expand coverage of ontology
• expand presentation examples
– animations, simulations
• explore user controls
• user study of effectiveness