Machine Learning

Download Report

Transcript Machine Learning

Machine Learning
Tutorial.1
Boolean algebra, probabilities
Boolean Algebra Tutorial
(Duncan Fyfe Gillies slides)
• Computer hardware works with binary numbers,
but binary arithmetic is much older than
computers
– Ancient Chinese Civilisation (3000 BC)
– Ancient Greek Civilisation (1000 BC)
– Boolean Algebra (1850)
Propositional Logic
• The Ancient Greek philosophers created a
system to formalize arguments called
propositional logic.
– A proposition is a statement that could be TRUE or
FALSE
– Propositions could be compounded by means of the
operators AND, OR and NOT
Propositional Calculus Example
• Propositions may be TRUE or FALSE:
– it is raining
– the weather forecast is bad
• A combined proposition:
– it is raining OR the weather forecast is bad
Propositional Calculus Example
• We can assign values to propositions, for example:
I will take an umbrella if and only if
it is raining OR the weather forecast is bad
• In other words the proposition “I will take an umbrella” is
the result of the Boolean combination (OR) between two
conditions:
raining
weather forecast being bad.
• In fact we could write:
I will take an umbrella =
it is raining OR the weather forecast is bad
Diagrammatic representation
• We can think of the umbrella proposition as a result that
we calculate from the weather forecast and the fact that
it is raining by means of a logical or.
Rain
Bad forecast
OR
Take an umbrella
Computer Organization class diagrams
Truth Tables
• Since propositions
can only take two
values, we can
express all possible
outcomes of the
umbrella proposition
by a table:
– True = 1
– False = 0
Rain
Bad
Forecast
Umbrella
False
False
False
False
True
True
True
False
True
True
True
True
Bad
Forecast
Umbrella
0
0
0
0
1
1
1
0
1
1
1
1
Rain
More complex propositions
• We can make our propositions more complex, for
example:
(Take Umbrella ) =
( NOT (Take Car ) ) AND
( (Bad Forecast ) OR (Raining ) )
Rain
Bad forecast
Take a car
OR
AND
NOT
Take an
umbrella
Boolean Algebra
• We could write propositional statements as above but to
perform calculations quickly and efficiently we can use
an equivalent, but more succinct notation.
• We also need a to have a well-defined semantics for all
the “operators”, or connectives that we intend to use.
• The system we will employ is called Boolean Algebra
(introduced by the English mathematician Boole in1850)
and satisfies the criteria above.
Fundamentals of Boolean Algebra
• The truth values are replaced by 1 and 0:
– TRUE = 1
– FALSE = 0
• Propositions are replaced by variables (for example):
– it is raining = R
– The weather forecast is bad = W
• Operators are replaced by symbols
– NOT = 
– OR = 
– AND = 
disjunction
conjunction
Truth Tables
All possible outcomes of the operators can be written as truth tables
NOR A
A OR B
A AND B
A
B
R
A
B
R
A
R
0
0
0
0
0
0
0
1
0
1
0
0
1
1
1
0
1
0
0
1
0
1
1
1
1
1
1
1
For n variables there are 2n possibilities.
Theory of Probability
Olga Kubassova,
School of Computing
http://www.comp.leeds.ac.uk/olga
7th February 2007
Probability Basics
0≤P≤1
• Experiment – a process that yields
outcome.
• Sample space – the set of all possible
outcomes.
• Event – a subset of the sample space.
– E.g. getting heads when tossing a coin.
• Mutually exclusive – if events cannot
occur simultaneously.
Example:
• Rolling a fair six-sided die:
• Sample Space: S={1,2,3,4,5,6}, |S| = 6
– Independent events
• Event: obtaining 4
Probability notation
• P(A) – the prob. of event A occurring
• P(A∪B) – prob. of A or B
– Probability of rolling 1 or 6
• P(A∩B) – prob. of A and B
– Probability or rolling 1 and 2
– Probability of rolling an even number grater than 3
• P(¬A) – prob. of A not occurring (the
complement of A)
Pay attention to the AND/OR notation (set theory)
Even numbers
on both dice?
Even numbers
on both dice
AND
Sum is greater
than 5
Sum is greater
than 5?
Even numbers
on both dice
OR
Sum is greater
than 5
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
3,1
3,2
3,3
3,4
3,5
3,6
4,1
4,2
4,3
4,4
4,5
4,6
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
Are the dice identical?
How many different events?
Example:
• Rolling a fair six-sided die:
• Sample Space: S={1,2,3,4,5,6}, |S| = 6
– Independent events
• Event: obtaining 4
|S |
• P(S) =
 p(event )  1
i 1
i
– Sum of probabilities of all (independent!) events
– P(eventi)=1/6 (uniform distribution)
Finite probability
• Experiment that has a finite number of
outcomes.
• Therefore, each event has a finite probability.
• Experiment:
Throw a die. A = number > 4. B = even number.
Find P(A), P(B), P(A∪B) and P(A∩B)
Mutually Exclusive Events
• Events are mutually exclusive if they are disjoint,
or cannot occur simultaneously
A∩B=0
Examples?
Even numbers
on both dice
AND
Sum is less
than 1
Uniform distributions
• Every outcome in the sample space is equally
probable.
• E.g. tossing coins
rolling dice
drawing a card from a deck
Conditional probability
• P(A | B) – the prob. of A happening
given that B has occurred.
P( A  B)
P( A | B) 
P( B)
Example of the Conditional
Probability
Imagine that 5% of people of a given population own at least
one dog. 2% of people own at least one dog and at least one
cat. What is the probability that someone will own a cat, given
that they also have a dog?
Let A = “dog owner”, B = “cat owner”, then: P(A) =
0.05; P(A ∩ B) = 0.02
P(B | A) = P(A ∩B)/P(A)
Independence
• If events A and B do not influence each
other then:
P(A | B) = P(A)
and
P(A∩B) = P(A)P(B)
Example (Independence)
• Let throw the coin 2 times. Probability that
the first time it is heads and the second it
is tail do not influence each other
Chain rule
Chain rule allows computing probabilities of sequences of the events
P( A1, A2 ,, An ) 
P( A1 ) P( A2 | A1 ) P( A3 | A1, A2 )P( An | A1 , An1 )
Chain rule (II)
• Let us compute sequences of events.
• E.g. prob. of sequence of letters, ‘t’, ‘h’
and ‘e’:
P(t,h,e) = P(t)P(h | t)P(e | th)
Bayes’ theorem
Bayes’ theorem (II)
• Meningitis causes stiff neck 50% cases
• Prob. of meningitis is 1/50000
• Prob. of stiff neck is 1/20
• What is the prob. of having meningitis if a patient has a stiff
neck?
Additional resources
• Andy Roberts homepage:
http://andy-roberts.net/teaching/index.html
• Google directory:
http://directory.google.com/Top/Science/Math/Prob
ability
• Wikipedia:
http://en.wikipedia.org/wiki/Probability
Exponentials – What are they?
• Simply: a number (base) raised to a power
(exponent).
• E.g. the area of a square whose sides are of
length 3…
• = 32
• = 3 squared, or 3 to the power 2
• =3×3
Exponentials – A quick explanation
• Easy enough to calculate:
b
•
•
•
•
ab = a × a × a … × a
So there are ‘b’ lots of ‘a’
What is 43?
4 × 4 × 4 = 64
Exponentials in Computer Science
• Have you ever noticed the following numbers
popping up in your computer science studies?
• 1, 2, 4, 8, 16, 32, 64, 128…1024…
• What do they all have in common?
• They are all powers of 2!!
Exponentials in Computer Science
• The powers of 2 should be (very) familiar to you
by now.
• All the computers you have been using work in
binary bits (at the lowest level)…
• 2 values: 0/1, true/false etc
• all ‘data sizes’ must be expressed in powers of
2.
Exponentials in Computer Science
•
•
•
•
•
•
•
1 kilobyte ≠ 1000 bytes (‘standard’ use of kilo)
1 kilobyte = 1024 bytes
Because…
29 = 512
210 = 1024
211 = 2048
So 210 is the closest we can get to 1000
Working with Exponentials
• First, some easy exponentials to
remember:
• a0 = 1
• a1 = a
Logarithms
• This is because log(arithm)s are just ‘reversed’
exponentials
• E.g. if 24 = 16
• log216 = 4
– base is 2
• Logarithms ‘map’ large numbers onto smaller
numbers
Logarithmic Scale
10
9
8
7
6
x
5
2^x
4
log(x,2)
3
2
1
0
1
2
3
4
5
6
7
8
9
10
Logarithms - Bases
• There are several common bases:
– 10: very common base, Richter scale etc.
– e: used by a lot of scientists
– 2: very common in computer science. WHY?
Logarithms – An example
• Imagine a (small) football tournament of 8
teams.
• It is a knockout tournament, so every time
that 2 teams play the losing team is
eliminated from the tournament and the
winning team goes on to the next round.
• How many rounds must be played to
determine an overall winner?
Logarithms – An example
• Imagine a (small) football tournament of 8
teams.
• It is a knockout tournament, so every time
that 2 teams play the losing one is
removed from the tournament and the
winning one goes on to the next round.
• How many rounds must be played to
determine an overall winner?
Logarithms – An example
Round 0: 8 teams
Round 1: 4 teams
Round 2: 2 teams
Round 3: 1 team
3 rounds needed!
Logarithms – Example explained
• 3 rounds are needed to determine the
winner of 8 teams, competing 2 at a time
(i.e. one-on-one)
• This can be easily calculated using logs.
• 2 teams play at a time, so the base is 2.
(i.e. 2x = 8, so we need to use log2)
• So, log2 8 = ….
• 3