Transcript rec12

Comp. Genomics
Recitation 12
Bayesian networks
Taken from Artificial Intelligence course, MIT, 6.034
http://courses.csail.mit.edu/6.034s/handouts/6034-review-sol.pdf
Question 1.1
• Draw a Bayesian network among the following binary
variables that model the outcome of an election:
• I: candidate is Incumbent
• M: has lots of Money for advertising
• A: uses advertisements that focus on Attacking the candidate’s
opponent
• Q: uses advertisements that focus on the candidate’s
Qualifications
• L: candidate is Liked
• D: opponent is Distrusted
• E: candidate is Elected
Question 1.1 – cont’d
• Your network should encode the following beliefs:
• Incumbents tend to raise lots of money.
• Money can be used to buy advertising that either focuses on the
candidate’s qualifications or that attacks the candidate’s
opponent. But if one does the first, there is less money to do
the latter.
• Attack advertisements tend to make voters distrust the
opponent but they also make the voters tend not to like the
candidate.
• Advertisement focusing on qualifications tends to make the
voters like the candidate.
• Candidates that people like tend to get elected.
• Candidates whose opponent people distrust tent to get elected.
Question 1.1 - solution
Question 1.2
• For each of the following, say whether it is or is
not asserted by the network structure you drew
(without assuming anything about the numerical
entries in the CPTs).
1. P(L | A,Q,D) = P(L | A,Q)
2. P(A | M,Q) = P(A | M)
3. P(L,D | A,Q) = P(L | A,Q) P(D | A,Q)
Question 1.2 - solution
1. P(L | A,Q,D) = P(L | A,Q)
Asserted
2. P(A | M,Q) = P(A | M)
Not asserted
3. P(L,D | A,Q) = P(L | A,Q) P(D | A,Q)
Asserted
Question 2
• Show a Bayesian network structure that
encodes the following relationships:
•
•
•
•
A
A
A
A
is
is
is
is
independent of B
dependent on B given C
dependent on D
independent of D given C
Question 2 - solution
• Nodes A and B have no parents
• Node C has two parents: A and B
• Node D has one parent: C
Question 3
• Which of the following conditional
independence assumptions are true?
1.
2.
3.
4.
5.
6.
7.
8.
A
A
B
B
B
A
A
B
and
and
and
and
and
and
and
and
E are independent
E are independent given D
C are independent
C are independent given A
C are independent given D
E are independent given B
E are independent given F
C are independent given E
Question 3 - solution
•
A and E are independent
False
•
A and E are independent given D
True
•
B and C are independent
False
•
B and C are independent given A
True
•
B and C are independent given D
False
•
A and E are independent given B
False
•
A and E are independent given F
False
•
B and C are independent given E
False
Question 4
• For each statement, name all of the graph
structures, G1-G4, or “none” that imply it.
Question 4 – cont’d
1. A is conditionally independent of B given C
2. A is conditionally independent of B given D
3. B is conditionally independent of D given A
4. B is conditionally independent of D given C
5. B is independent of C
6. B is conditionally independent of C given A
Question 4 - solution
•
A is conditionally independent of B given C
•
•
G2
A is conditionally independent of B given D
•
•
none
B is conditionally independent of D given A
•
•
G3,G4
B is conditionally independent of D given C
•
•
none
B is independent of C
•
•
G2,G3
B is conditionally independent of C given A
•
G1,G2,G4
HW solution – ass. 2, q. 5
• Let G = (G1, … , Gn) be n contiguous DNA
regions representing genes. For each Gi
we define the mRNA concentration of the
gene as Pi, s.t. their sum is equal to 1. P
= (P1, … , Pn) can be interpreted as the
normalized expression levels for the
regions in G.
HW solution – q. 5 – cont’d
• Our model assumes that reads are
generated by randomly picking a region R
from G according to the distribution P,
and then copying this region. The copying
process is error-prone. This process is
repeated until we have a set of m reads R
= r1, … , rm generated according to the
model described above.
HW solution – q. 5 – cont’d
• For each region Gj and read ri, we have a
probability pij = P(rj | Gi), the probability
of observing rj given that the locus of the
read was gene Gi. In practice, for each
read rj, this probability will be close to
zero for all but a few regions.
Likelihood function
• Write the likelihood of observing the m
reads.
Q function
• Write the Q(P | P(t)) term.
M-step
• Write the M-step term using argmax
function.
Update rule
• Infer from c the update step for P.