root question - The Stanford University InfoLab

Download Report

Transcript root question - The Stanford University InfoLab

On-Line Testing Center
Database Laboratories
Root Questions
Automating Homeworks
1
The Story
 Centered around the database course
sequence, we have developed tools for
increasing the efficiency of teaching.
1. Laboratories that give immediate, accurate
feedback for teaching SQL, etc.
2. Automated homeworks that simulate the
effect of carefully graded “long-answer”
homework.
2
Productivity in Education
The education industry has a terrible
productivity-improvement record.
Not only are database systems
essential for delivering improvements,
but the DBMS courses serve as a
wonderful example of how new
technology can impact education.
3
Comparison: Versus Telecom
Tuition
3-min LD call
Ratio
1959
$ 1,200
$3.00
400
2004
$30,000
$0.15
200,000
In 45 years, high-end college tuition has gotten
5000 times more expensive relative to a
long-distance phone call!
4
But Isn’t … ?
The telecom industry is arguably the
best example of the use of technology
to reduce costs.
How about the much-maligned US Post
Office?
5
Comparison: Versus Post Office
Tuition
Airmail Stamp
Ratio
1959
$ 1,200
$0.08
15,000
2004
$30,000
$0.37
81,000
In 45 years, high-end college tuition has gotten
5.4 times more expensive relative to a stamp!
6
Our Assumptions
Pure on-line education is failing.
4-year, residential education has great
value.
We can make instructors and TA’s more
efficient through automation.
We can give course staff the time to do
what they do best: individual, informal
instruction.
7
The Database Course as an
Example
 How a DB course could be automated:
1.
2.
3.
4.
Lectures --- extended textbook.
On-line, instant-help desk.
Programming laboratory.
Homework and exams.
 Similar requirements for any
programming course; 1, 2, 4 for any
course.
8
Lectures
We have PowerPoint slides with
voiceover for an introductory DB
course.
Intended use: play for 50-60% of the
lecture; use the rest of the time for
discussion.
Pace is critical --- stop for class thought
after each slide.
9
Solution
Beer groups with at least
3 non-NULL bars and also
beer groups where the
manufacturer is Pete’s.
SELECT beer, AVG(price)
FROM Sells
GROUP BY beer
HAVING COUNT(bar) >= 3 OR
beer IN (SELECT name
FROM Beers
WHERE manf = ’Pete’’s’);
Beers manufactured by
Pete’s.
10
Help Desk
“To be done.”
Scale is important. With 1500
students/year, we can afford a 24/7 TA
answering queries by email.
Vital for any course involving
programming --- so students don’t get
stuck on “Oracle says ‘semicolon
missing’; what does that mean?”
11
Help Desk --- (2)
Technology boost: customer-support
software.
Need to accumulate knowledge about
tricky points and errors in assignments
and material.
Need fast keyword search:
 Find on-line guides to assist TA in
responding to a specific question.
12
Laboratory Assignments
Conventional SQL homework: “Here is a
database; write these queries in SQL.”
TA’s look at SQL answers and try to
figure out whether the queries do what
they’re supposed to do.
Rate of regrades tells me this task is
too hard to get right.
13
OTC Laboratory
 OTC (On-line Testing Center) solves
this problem by:
1. Giving students a description of a
database schema.
2. Asking them to enter certain SQL queries.
3. Telling them whether their query is
syntactically-wrong, syntactically-correctbut-gives-the-wrong-result, or correct.
14
Behind the Scenes
OTC uses an Oracle DBMS, in which the
schema for each assignment is held,
along with carefully selected tuples to
populate the relations.
Relations must expose common errors.
15
Behind the Scenes --- (2)
Queries are passed to Oracle via JDBC,
and checked for syntax errors.
We create an instance of the generic
query interface (sqlplus) only if the
student requests help locating syntax
errors.
16
Behind the Scenes --- (3)
Syntactically correct queries are
executed on the sample database.
Answers are checked for the presence
of certain strings and the absence of
others.
Almost certainly catches wrong
answers; always accepts right answers.
17
Creating a Lab
 Interface allows designer to specify:
1. The stem (informal description of the
schema and the queries to be written).
2. Reference queries used to generate the
correct tuples for each query.
3. INSERT statements to initialize the
database.
18
Other Labs
 Recently added: similar lab-creation
faciltities for:
1. Relational algebra.
2. JDBC.
3. XQUERY.
19
Policy Issues
The lab is set up so students may
submit a query as many times as they
like.
Once correct, a query can be stored
and the next one worked on.
20
Feedback on Labs
An unsolved problem is how to give
students advice when their query is
syntactically correct but semantically
incorrect.
Showing them the test database is a
bad idea, because they can then tailor
their query to the data.
21
Probable Architecture
1. Create “shadow database” isomorphic
to the test database, but with other
values.
2. Use “lineage tracing” to determine
where an incorrect tuple or missing
correct tuple comes from.
3. Express the problem in terms of the
shadow DB.
22
Why It’s Not That Simple
Queries involve particular constants.
Changing the constants in your
explanation doesn’t explain anything.
Example: “find all the bars in Boston.”
 The shadow DB better not change ’Boston’
in tuples or you’ll be explaining: “if the DB
contains (’Joe’’s Bar’, ’Miami’) you need to
produce ’Joe’’s Bar’ in your answer.”
23
A Harder Example
Consider query: “find all the beers Joe’s
Bar sells for less than $5.”
You can’t change prices in tuples like
(’Joe’’s Bar’, ’Bud’, 4.00) randomly, or
you’ll give advice like “if the DB
contains (‘Joe’’s Bar’, ‘’Coors’, 6.50), you
need to produce ’Coors’.”
24
Example --- Continued
You need a “less than $5 – preserving”
transformation.
Example: p -> 2*p – 5.
25
Automating Homework
The heart of OTC is a system for
automating homeworks and exams.
Goal 1: Encourage students to work
“long-answer” problems for themselves.
Goal 2: Inhibit cheating.
Goal 3:Eliminate the drudgery of
grading, while still giving students
feedback.
26
Modeling “Long-Answer”
Questions with Multiple-Choice
Here is a typical “long-answer” question
we might ask in a DB course:
Relation R consists of the following tuples,
and relation S has the following tuples.
Compute the join of R and S.
27
Root Questions
A root question is a multiple-choice
question with several right and many
wrong answers.
Example:
Relation R consists of the following tuples,
and relation S has the following tuples. Which
of these tuples is in the join of R and S ?
28
Writing a Root Question
The question-designer provides several
correct answers.
 In our example, each tuple of the join
could be one correct answer.
Many wrong answers are also provided.
 Here, any tuple of the correct length that is
not in the join could be used.
29
30
31
32
Assigning Root Questions
The instructor develops an assignment
consisting of several root questions.
 4-6 seems to be the right number --- we’ll
see why.
Students take the assignment as many
times as they like and are encouraged
to get a perfect score.
Only the final score counts.
33
Assigning Root Questions --- (2)
Each time the student opens the
assignment, they are given the same
questions, but with a different choice of
one correct and three incorrect
answers, in random order.
To prevent rapidfire guessing, the
student may open an assignment only
once per 15 minutes.
34
Student Responses
Ideally, students open the assignment
and see if they can work their particular
instances of the root questions.
If they can work a question instance,
they probably understand it.
If not, they need to study the subject.
35
Student Responses --- (2)
Each root question suggests a
conventional, “long-answer” question,
that the student should work.
Example: for the join question, they
may as well compute the entire join.
 With the join tuples listed on scratch paper,
they can quickly solve any instance of the
root question.
36
37
38
39
40
How Many Questions?
We recommend 4-6 questions per
assignment.
Fewer than 4 encourages students to
guess; too many questions runs the risk
a student will miss one for carelessness.
 When first given at Stanford with no 15minute rule, some students tried hundreds
of times.
41
Comparison
There is a simpler scheme used in
courses like physics, where questions
are parametrized, and the correct
answer computed by a formula.
A weight of $w kilograms is dropped
from height $h. How long does it
take the weight to reach the ground?
42
Comparison --- (2)
Question is generated by choosing
random values of the parameters, and
the answer checked against the result
of the formula.
Root questions simulate this question
type by selecting many parameter
values and asking for a correct pairing
of parameters and result.
43
Comparison --- (3)
Example:
A weight of w kilograms is dropped from
height h. For which of the following
triples (w, h, t ) is t the time it takes the
weight to reach the ground?
44
Comparison --- (4)
In the database domain, many kinds of
questions cannot have their answer
computed by arithmetic formula:
 “Which of these functional dependencies
follows from the given FD’s?”
 “Which of these schedules is serializable?”
 “For which relation sizes is query plan A
better than plan B?”
45
Comparison --- (5)
If you are willing to write a program to
(say) test serializability, you can write a
program that generates a root question
with lots of serializable and lots of
unserializable schedules.
The output of this program can be
input automatically to OTC.
46
OTC Status
About 300 root questions, mostly on
databases, developed.
 Let’s face it: writing a root question
correctly is hard.
 But once done and debugged, it can be
used in many courses.
47
OTC History --- Spring, Fall, 2002
One assignment in Stanford CS347
(Transaction-Processing and Distributed
Databases) supported, Spring 2002.
CS145 (Intro. DB course at Stanford)
supported in Fall, 2002.
 2 Lab assignments, 11 root-question
assignments, midterm (not root questions).
48
OTC --- Winter, 2003
Supported CS245 (DB Implementation,
Hector Garcia) at Stanford.
Supported a CS145/245-like course at
North Carolina State (Rada Chirkova).
49
OTC Status --- Spring 2003
Supported CS145, CS347, and CS345
(DB Theory) at Stanford.
Continued support at NC State.
Supported CS145-like courses at UC
Santa Cruz (Arthur Keller) and Univ. of
Leipzig (Erhard Rahm).
Supported a Discrete Math course at
NTU Athens (Foto Afrati).
50
OTC Development Team
The core software was developed by
Murty Valiveti and his team at Gautami
Software.
Alan Beck and Ramana Yerneni adapted
the OTC core for database instruction
and implemented a number of
important features.
51
Content Creators
Alan Beck: SQL, JDBC, and XQUERY
labs.
Austin Shoemaker: relational algebra lab.
Root-question developers: Foto Afrati,
Rada Chirkova, Mayur Datar, Prasanna
Ganesan, Wang Lam, Anand Rajaraman,
Jeff Ullman, Jennifer Widom, Ramana
Yerneni.
52
Find Out More
A tutorial for instructors is at
www-db.stanford.edu/~ullman/pub/otc.pdf
Demo site:
bess.stanford.edu:8181/CS145-demo/index.html
53