ChemistryKnowledgeSpace

Download Report

Transcript ChemistryKnowledgeSpace

Knowledge Space Map for
Organic Reactions
Knowledge Space Theory
Existing Rule Set Basis for Chemistry
Knowledge Space Model
Data Model Proposal
Constructing and Learning the Map

Knowledge Space Map



Isolate atomic knowledge units / nodes / elements
Determine dependency graph of knowledge units
(defines a learning order by topological sort)
Enables targeted and purposeful lesson plans based on
the “fringes” of student’s current knowledge state
Addition
Multiplication
Exponents
Spelling
Subtraction
Division
Fractions
Logarithms
Vocabulary
Grammar
Chemistry Knowledge Space?



Current system has user driven selection of
which chapter(s) to work on, then system
randomly generates problem
Idealized approach: Assess student’s current
knowledge state and auto-generate next
problem to target next most useful subject
Existing tutorial based on predictive power of
80+ reagents, which are based on 1500+
elemental rules. These could be interpreted as
1500+ knowledge units
Rule Clustering

Many rules are just variants of the same concept
/ knowledge unit









Alkene,
Alkene,
Alkene,
Alkene,
Alkene,
Alkene,
…
Protic
Protic
Protic
Protic
Protic
Protic
Acid
Acid
Acid
Acid
Acid
Acid
Addition, Alkoxy
Addition, Benzyl
Addition, Allyl
Addition, Tertiary
Addition, Secondary
Addition, Generic
Some rules will always be used in conjunction
with another (like “qu”)
Not really a learning dependency order between
these rules then, you essentially know one of
the rules IFF (if and only if) you know the others
Data Model Proposal



Want general framework for representing relationships
Each reaction rule represents an elementary knowledge
unit node
Weighted, directed edge between each node
represents learning dependency relationship
A  B (90%)




Given that a student “knows” rule B, there is a 90% probability
that they “know” rule A
Conversely, if do NOT know rule A, 90% probability that do NOT
know rule B.
Define “know”: Student should consistently answer correct any problem
that is based only on rules that they “know”
Define rule similarity measure as average of reciprocal dependency
relationships
Major Relationship Cases

Strong learning dependency



Strong similarity / mutual dependency



A  B (99%)
A  B (50%)
A  B (99%)
A  B (99%)
No relation (random correlation)


A  B (50%)
A  B (50%)
Additional Enhancements

Add baseline probability of “knowing” each
node, instead of assuming uniform 50%


Analogous to using background weights for
amino acid distribution in protein sequence
Add a confidence number for each of
these probability weights to reflect how
trustworthy our prior data is

Analogous (maybe equal) to n, the number of
data points that were used to arrive at the
current estimate
Learning Relationship Map


Give students assessment exams based on the
rule sets with criteria to distinguish problems
that students get “right” vs. “wrong”
Defines sets of rules



R: All rules used in problems students got right
W: All rules used in problems students got wrong
(that are not in R)
Adjust rule relation values



Decrease Ri  Wj relations
Increase Ri  Rk relations
Scale adjustment based on confidence in prior
Learning Propagation




Each assessment exam may only cover a
handful of specific rules in R and W
When updating relation for rule R1  R2, look
for all rules similar to R1 and all similar to R2
Assume respective updates for all relations
between similar rule pairs, scaled by the
magnitude of similarity to R1 and R2
Technically, all rules are similar to all others by
some degree, but don’t want to update 15002
relations every time. Set similarity threshold,
which effectively defines clusters around rules.
Constructing Relationship Map

Initial pass should be able to automatically find a
lot of “similarity” relationships just based on
existing structured data





Rule names
Combined usage in test examples
Included in common reagents, chapters, etc.
Use book chapters order as initial guess for
dependency orders
Similarity analysis could reduce 1500+ rules to
~100? rule “clusters” which is more tractable to
manually assign major dependencies not
automatically addressed by book chapter order
Open Questions


Student knowledge evolves over time,
maybe even with one exam. How to hit
“moving target” of their current
knowledge state?
Baseline probabilities of knowing a rule.
Random sample of all students? Will differ
greatly based on population sample
chosen.
SMILES Extensions

Atom Mapping


Necessary to map reactant to product atoms
Proper transform requires balanced stoichiometry

Hydrogens generally must be explicitly specified
O1
8
+ H
2
9
R1
3
OH
4 5
O
10
NH-R2
7
Carboxylic acid +
Primary amine 
Amide +
Water
1
7,8 3
+ H2O
2
9
R1
4 5
10
NH-R2
[O:1]=[C:2]([*:9])[O:3][H:7].
[H:8][N:4]([*:10])[H:5]>>
[O:1]=[C:2]([*:9])[N:4]([*:10])[H:5].
[H:7][O:3][H:8]
Transformation Rules



H3C
Chemical state machine modeling at
mechanistic level of detail
State information: Molecular structure
State transition: Transformation rules
H
Br
CH2
H3C
p-bond
protic acid
addition
-
Br
H3C
+
CH3
carbocation
halide
addition
H3C
H3C
Br
CH3
H3C
SMIRKS
Description
[C:1]=[C:2].[H:3][Cl,Br,I:4]>>[+0:3][C:1][C+:2].[Cl,Br,I;-:4]
Alkene, Protic Acid
Addition
[C+:1].[Cl,Br,I;-:2]>>[C+0:1][+0:2]
Carbocation, Halide
Addition