CS 262 Computational Genomics
Lecture 7: Hidden Markov Models Intro
Scribe: Vivian Lin
January 31, 2012
1. Introduction
Hidden Markov Model (HMM) is a statistical model for sequences of symbols; it is used
in many different fields of computer science for modeling sequential data. In
computational biology, the first HMM-based program, GENSCAN, was built in 1997.
This program was developed by Chris Burge at Stanford University to predict the
location of genes.
In this lecture, the HMM definitions, properties, and the three main questions (evaluation,
decoding, and learning), are discussed. During the discussion, the example of the
dishonest casino was provided for better understanding of the material. In addition, the
Viterbi algorithm and the forward algorithm are analyzed to solve the evaluation and
decoding questions.
2. Example: The Dishonest Casino
The lecture started by giving the dishonest casino example to describe HMM. Consider a
casino has two dice:
Fair die
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
Loaded die
P(1) = P(2) = P(3) = P(4) = P(5) = 1/10
P(6) = 1/2
Casino player switches back-and-forth between fair and loaded die once every 20 turns.
The rule of the game is:
1. You bet $1
2. You roll (always with a fair die)
3. Casino player rolls (maybe with fair die, maybe with loaded die)
4. Highest number wins $2
Then, a sequence of rolls by the casino player is given:
1245526462146146136136661664661636616366163616515615115146123562344
With this given sequence, there are three main questions we would be able to answer
using HMM:
1. Evaluation: How likely is this sequence, given our model of how the casino works?
2. Decoding: What portion of the sequence was generated with the fair die, and what
portion with the loaded die?
3. Learning: How “loaded” is the loaded die? How “fair” is the fair die? How often
does the casino player change from fair to loaded, and back?
3. Definition of an Hidden Markov Model
A Markov chain is useful when computing a probability for a sequence of states that we
can observe in the world. In many cases, the states we are interested in may not be
directly observable. In contrast, a hidden Markov model allows us to talk about both
observed states and hidden states that we think of as causal factors in the probabilistic
model.
Formally, the HMM is defined as follows:
Observation Alphabet: = { b1, b2, …, bM }
Set of hidden states: Q = { 1, ..., K }
Transition probabilities: alk
alk = transition probability from state l to state k = P(i =k | i-1 =l).
In addition, the transition probabilities from a given state to every state
(include itself) should sum up to 1: ai1 + + aiK = 1, for all states i =
1…K
Start probabilities: a0i
a0i = probability of starting in state i
Note that the sum of the start probabilities should add up to 1: a01 + … +
a0K = 1.
Emission probabilities: ek(b)
ek (b) = the probability that symbol b is been when in state k = P( xi = b |
i = k)
Again, the sum of the emission probabilities should add up to 1: ei(b1)
+ … + ei(bM) = 1, for all states i = 1…K
Note that the end probabilities ai0 in Durbin (Biological Sequence Analysis) is not needed
since we assume the length of the sequence we are analyzing is given.
K
1
2
In the dishonest casino example, we can present the example with the following graphical
model and settings:
Alphabet: = numbers on the dice = { 1,2,3,4,5,6}
Set of states: Q = { Fair, Loaded }
Transition probabilities:
probability from fair to loaded
probability from fair to fair
probability from loaded to fair,
Probability from loaded to loaded.
Start probabilities: probability of starting with a loaded die or a fair die
Emission probabilities: probabilities of rolling a certain number in a fair or loaded
state. For instance, the probability of rolling a 6 is 1/6 in the fair state but ½ in the
loaded state.
4. A HMM is memory-less
Memory-less is an important property of a HMM. This property is based on the two
assumptions: Markov assumption and independence assumption. Because of these two
assumptions, solving problems becomes simple. Markov assumption states that the
current state is dependent only on the previous state. In other words, at each time step t,
the only thing that affects future states is the current state
t :
P(t+1 = k | “whatever happened so far”) =
P(t+1 = k | 1, 2, …, t, x1, x2, …, xt) =
P(t+1 = k | t)
The second assumption, independence assumption, states that the output observation at
time t is dependent only on the current state. In other words, at each time step t, the only
thing that affects xt is the current state
t:
P(xt = b | “whatever happened so far”) =
P(xt = b | 1, 2, …, t, x1, x2, …, xt-1) =
P(xt = b | t)
FAIR LOADED
0.05
0.05
0.950.95
P(1|F) = 1/6
P(2|F) = 1/6
P(3|F) = 1/6
P(4|F) = 1/6
P(5|F) = 1/6
P(6|F) = 1/6
P(1|L) = 1/10
P(2|L) = 1/10
P(3|L) = 1/10
P(4|L) = 1/10
P(5|L) = 1/10
P(6|L) = 1/2
5. A parse of a sequence
Given a sequence of symbols: x = x1……xN, and
a sequence of states, called parse, = 1, ……, N
We can present a parse as follows:
Given a HMM like the one above, we can generate a sequence of length n. First, we
begin at a start state, and then we generate an emission, then we transfer from one state to
another. Formally, we have
1. Start at state 1 according to probability a01
2. Emit letter x1 according to probability e1(x1)
3. Go to state 2 according to probability a12
4. … until emitting xn
6. Likelihood of a Parse
Given a sequence x = x1……xN and a parse = 1, ……, N, we can find how likely this
parse is. From the given HMM, we can calculate this by looking at all the probabilities of
emissions and transitions:
P(x, ) = P(x1, …, xN, 1, ……, N)
= P(xN | N) P(N | N-1) ……P(x2 | 2) P(2 | 1) P(x1 | 1) P(1)
= a01 a12……aN-1N e1(x1)……eN(xN)
Now, the equation can be written in a more compact way by letting each parameter of aij
and ei(b) be presented by θi. In our dishonest casino example, we have 18 parameters in
total: 2 start probabilities (a0Fair : 1; a0Loaded : 2), 4 transitions probabilities (aFairFair : 3,
aFairLoaded : 4, aLoadedFair : 5, aLoadedLoaded : 6), 12 emission probabilities (6 from each of the
states).
Then, we define feature counts, F(j, x, ), to be the number of times parameter j occurs
in (x, ). Replacing all the aij and ei(b) with the feature counts, the compact equation for
parse likelihood becomes
P(x, ) = j=1…n jF(j, x, ) = exp[j=1…n log(j)F(j, x, )]
1
2
K
1
2
K
1
2
K
1
2
K
x1x2x3xn
2
1
K
2
0
e2(x1)
a02
Let’s practice this equation with our dishonest casino example. Suppose our initial
probabilities are a0Fair = ½ and aoLoaded = ½, and we have a sequence of rolls: x = 1, 2, 1, 5,
6, 2, 1, 5, 2, 4. Then the likelihood of x = Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair,
Fair is P(start at Fair) P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair)
= ½ (1/6)10 (0.95)9 = .00000000521158647211 ~= 0.5 10-9
where ½ is the start probability, 1/6 is the emission probability from the fair die, and 0.95
is the transition probability from Fair state to Fair state.
With the same sequences of rolls, the likelihood of x = Loaded, Loaded, Loaded, Loaded,
Loaded, Loaded, Loaded, Loaded, Loaded, Loaded is
P(start at Loaded) P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded)
= ½ (1/10)9 (1/2)1 (0.95)9 = .00000000015756235243 ~= 0.16 10-9
where the first ½ is the start probability, 1/10 is the emission probability of NOT rolling
six from the loaded die, the second ½ is the emission probability of rolling six from the
loaded die, and 0.95 is the transition probability from Loaded state to Loaded state.
Therefore, it somewhat more likely that all the rolls are done with the fair die than that
they are all done with the loaded die
Now, changing the sequence of rolls to x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6. The likelihood = F,
F, …, F equals to ½ (1/6)10 (0.95)9 ~= 0.5 10-9, which is the same as before. But
the likelihood =L, L, …, L equals to ½ (1/10)4 (1/2)6 (0.95)9 ~= 0.5 10-7 . This
time, it is 100 times more likely the die is loaded.
7. Overview of the Three Main Questions on HMMs
We can define the three main questions on HMMs as follows:
Evaluation: What is the probability of the observed sequence?
o Given: a HMM M, and a sequence x,
o Find : Prob[ x | M ]
o Algorithm: Forward
o Example of dishonest casino: How likely is this sequence, given our
model of how the casino works?
1 / 10 100%
La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !