Powerpoint slides from the SIGCSE 2004 talk
Download
Report
Transcript Powerpoint slides from the SIGCSE 2004 talk
Core Empirical Concepts and Skills for
Computer Science
Grant Braught
Dickinson College
[email protected]
Craig Miller
DePaul University
[email protected]
David Reed
Creighton University
[email protected]
Talk outline:
• Overview of the project
• List of core competencies
• Sample exercises/laboratories
• Web repository
1
Overview of the project
CS demands basic competencies in experimentation and data analysis
e.g., choosing a data structure/algorithm, optimizing a server, comparing CPUs, GUI design
But little attention has been paid to empirical aspects in CS curricula
can find examples of experimentation in specific courses, but no systematic approach
see SICGSE 2001 paper by authors for more details/motivation
In 2003, we began work on an NSF CCLI project (NSF DUE—0230950) to
1.
Formalize a set of core empirical concepts and skills needed by today's computer
science graduate.
2.
Develop and evaluate resources for integrating the presentation and development of
those concepts and skills throughout the computer science curriculum.
3.
Disseminate our approach for adoption at other institutions.
2
Core empirical competencies
KEY POINT 1: any list of core concepts & skills must be incremental
as with programming, must develop understanding over time
introduce foundational concepts early, revisit often, build on existing knowledge as new
applications arise
applications become more open-ended as skills develop
KEY POINT 2: it must also complement traditional CS curricula
CS curricula are already "full", faculty are stretched
to integrate concepts/skills with minimal effort, organize around traditional practice
Introductory level: augment programming with simulation/experimentation/analysis
instill basic understanding that programs are used to solve problems, provide informal tools
Intermediate level: provide formal measures/techniques for core CS, software testing
transition from following prescribed experiments to successfully designing experiments
Advanced level: build upon foundations, add domain-specific concepts where needed
students are expected to design, conduct, and analyze experiments in various settings
3
Level
Concepts
mean vs. median
informal definition of probability
chance error
expected values
Law of Large Numbers
consistency vs. accuracy
benefits/limitations of models
Skills
Introductory
Intermediate
uniform vs. normal distribution
standard deviation
sampling large populations
methods for obtaining a good sample
curve-fitting (e.g., linear regression)
data presentation (e.g., tables, scatterplots, histograms)
software testing & debugging strategies
validity threats (e.g., confounding variables,
non-generalizability)
Be able to plot and describe the relation between
two variables, such as problem size vs.
efficiency when analyzing algorithms.
Be able to use a sample to make predictions
about a population.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as appropriate statistical measures,
relevance of a model, and generality of the
conclusions.
Be able to design a test suite for a software
project and conduct systematic debugging.
Advanced
standard error
confidence intervals
measures of goodness of fit (e.g.,
correlation coefficient)
observational study vs. controlled
experiment
correlation vs. causality
significance tests (e.g., t-test, z-score)
Be able to apply experimental methods to the
analysis of complex systems in different
domains.
Be able to formulate testable hypotheses and
design experiments for supporting or refuting
those hypotheses.
Be able to conduct a well-defined experiment,
summarize the results, and compare with the
expected results.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as the clarity of the hypothesis, the sample
size, and data consistency.
4
Introductory level examples
Three Monte-Carlo simulations
Estimating the number of 4 letter words recognized by the spell checker
Estimating the value of
Comparing country drunkards to city drunkards
These simulation assignments:
Incrementally develop consistency and accuracy concepts
Emphasize computer simulation as a tool for scientific investigation
Can be easily adapted to reinforce traditional programming topics
5
Estimating the number of 4 letter words
The question
How many English words can be formed using only the letters “etaoinshrd”?
The experiment
Generate 1000 random 4 letter sequences using the ten letters “etaoinshrd”
Count the number of sequences recognized as English words
Estimate the total number of 4 letter words
# of recognized sequences
Total# of 4 letter words
104
1000
The analysis
Compare results from multiple trials for “consistency”
Intuitive binary definition of consistency: Within 10% of the mean
Compare results to actual value using percent error
6
Refining consistency and accuracy
Estimating by throwing darts
The experiment
Estimate using 100, 1000 and 10,000 darts
darts in circle area of circle
total darts
area of square 4
The analysis
Relative consistency: Range as a percentage of the mean
Relationship between consistency and accuracy
Effects of systematic error
7
Further refining consistency
Country drunkards vs. city drunkards
Which has a greater expected net distance, a fully random walk or a Manhattan
random walk?
Fully random walk
The experiment
Manhattan walk
Simulate fully random and Manhattan random walks
The analysis
Compare the mean net distances
Compare the distribution and consistency of net distances
Formal definition of consistency using standard deviation
8
Intermediate level example
Optimization of the average case for quick sort
At what array size should quick sort switch to insertion sort?
Quick Sort vs. Insertion Sort
0.012
Time (ms)
0.01
0.008
0.006
0.004
0.002
0
0
5
10
15
20
25
30
35
40
Array Size (n)
Quick Sort
Insertion Sort
Q(n)
I(n)
Select the switch point (M) to maximize the expected time saved by using
insertion sort
M
1
Expected Time Savings Q(i) I(i)
M i1
9
Level
Concepts
mean vs. median
informal definition of probability
chance error
expected values
Law of Large Numbers
consistency vs. accuracy
benefits/limitations of models
Skills
Introductory
Intermediate
uniform vs. normal distribution
standard deviation
sampling large populations
methods for obtaining a good sample
curve-fitting (e.g., linear regression)
data presentation (e.g., tables, scatterplots, histograms)
software testing & debugging strategies
validity threats (e.g., confounding variables,
non-generalizability)
Be able to plot and describe the relation
between two variables, such as problem size
vs. efficiency when analyzing algorithms.
Be able to use a sample to make predictions
about a population.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as appropriate statistical measures,
relevance of a model, and generality of the
conclusions.
Be able to design a test suite for a software
project and conduct systematic debugging.
Advanced
standard error
confidence intervals
measures of goodness of fit (e.g.,
correlation coefficient)
observational study vs. controlled
experiment
correlation vs. causality
significance tests (e.g., t-test, z-score)
Be able to apply experimental methods to the
analysis of complex systems in different
domains.
Be able to formulate testable hypotheses and
design experiments for supporting or refuting
those hypotheses.
Be able to conduct a well-defined experiment,
summarize the results, and compare with the
expected results.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as the clarity of the hypothesis, the sample
size, and data consistency.
10
Advanced level example: theoretical vs. empirical
For many domains, performance can be estimated in two ways:
Theoretically --- analyze and sum component steps
Empirically --- time a sample of actual performances
Example: Measuring the usability of an interface in terms of the time
needed to complete the task
Theoretically: analyze user actions needed to complete the task
Empirically: time users performing the task
11
Estimating task completion times
Theoretical
Keystroke-level model (Card, Moran and Newell)
Decompose task into the following operations:
Keying, includes key presses and mouse clicks
Pointing: moving mouse pointer to position on screen
Homing: moving hand from keyboard to mouse or mouse to keyboard
Model provides method for adding mental operations
Sum time costs
Empirical
Create task instructions
Request participants to perform task after providing task instructions
Time participants performing task
Calculate average and confidence intervals to estimate mean task completion time for a
user population
12
Comparing analytical and empirical methods
Limitations of analytical approach
Parameter assumptions may be poor estimates
Some analysis decisions require expert judgment
Some sequences of actions may have hidden efficiencies or costs
Limitations of empirical approach
Sample of trials may be too small to estimate mean for entire population
Participants may not be representative of actual users
Test environment may not simulate natural usage
13
Domain (HCI) concepts and skills
Running a simple usability test
Scripting natural instructions
Creating natural test environment
Learning relative time costs of user controls
Pointing actions are an order of magnitude more costly than keying actions
Most interactions can be decomposed into a small set of physical actions
Variation among human behavior
Aggregate human behavior is predictable
14
Level
Concepts
mean vs. median
informal definition of probability
chance error
expected values
Law of Large Numbers
consistency vs. accuracy
benefits/limitations of models
Skills
Introductory
Intermediate
uniform vs. normal distribution
standard deviation
sampling large populations
methods for obtaining a good sample
curve-fitting (e.g., linear regression)
data presentation (e.g., tables, scatterplots, histograms)
software testing & debugging strategies
validity threats (e.g., confounding variables,
non-generalizability)
Be able to plot and describe the relation between
two variables, such as problem size vs.
efficiency when analyzing algorithms.
Be able to use a sample to make predictions
about a population.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on
issues such as appropriate statistical
measures, relevance of a model, and
generality of the conclusions.
Be able to design a test suite for a software
project and conduct systematic debugging.
Advanced
standard error
confidence intervals
measures of goodness of fit (e.g.,
correlation coefficient)
observational study vs. controlled
experiment
correlation vs. causality
significance tests (e.g., t-test, z-score)
Be able to apply experimental methods to the
analysis of complex systems in different
domains.
Be able to formulate testable hypotheses and
design experiments for supporting or refuting
those hypotheses.
Be able to conduct a well-defined
experiment, summarize the results, and
compare with the expected results.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as the clarity of the hypothesis, the sample
size, and data consistency.
15
Web repository
Hopefully, this list of core empirical concepts & skills will
raise awareness
initiate discussion
lead to greater integration of empirical methods
The list and example labs can be found at the project Web site:
http://empirical.cs.creighton.edu
A variety of labs are accessible, along with instructor advice & assessment tools
More labs will be added over the course of the project
Feedback, ideas, and classroom examples & labs are always welcome
(see the feedback form at the Web site)
16
Level
Concepts
mean vs. median
informal definition of probability
chance error
expected values
Law of Large Numbers
consistency vs. accuracy
benefits/limitations of models
Skills
Introductory
Intermediate
uniform vs. normal distribution
standard deviation
sampling large populations
methods for obtaining a good sample
curve-fitting (e.g., linear regression)
data presentation (e.g., tables, scatterplots, histograms)
software testing & debugging strategies
validity threats (e.g., confounding variables,
non-generalizability)
Be able to plot and describe the relation between
two variables, such as problem size vs.
efficiency when analyzing algorithms.
Be able to use a sample to make predictions
about a population.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as appropriate statistical measures,
relevance of a model, and generality of the
conclusions.
Be able to design a test suite for a software
project and conduct systematic debugging.
Advanced
standard error
confidence intervals
measures of goodness of fit (e.g.,
correlation coefficient)
observational study vs. controlled
experiment
correlation vs. causality
significance tests (e.g., t-test, z-score)
Be able to apply experimental methods to the
analysis of complex systems in different
domains.
Be able to formulate testable hypotheses and
design experiments for supporting or refuting
those hypotheses.
Be able to conduct a well-defined experiment,
summarize the results, and compare with the
expected results.
Be able to evaluate the persuasiveness of
experimental conclusions, focusing on issues
such as the clarity of the hypothesis, the sample
size, and data consistency.
17