Marketing and CS
Download
Report
Transcript Marketing and CS
Forensics and CS
Philip Chan
CSI: Crime Scene Investigation
www.cbs.com/shows/csi/
high tech forensics tools
DNA profiling
Use as evidence in court cases
DNA
Deoxyribonucleic Acid
Each person is unique in DNA (except for
twins)
DNA samples can be collected at crime
scenes
About .1% of human DNA varies from person
to person
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are
recorded for comparing DNA samples.
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are
recorded for comparing DNA samples.
Two DNA profiles from the same person have
matching values at all loci.
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are
recorded for comparing DNA samples.
Two DNA profiles from the same person have
matching values at all loci.
More or fewer loci are more accurate in
identification?
Tradeoffs?
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are
recorded for comparing DNA samples.
Two DNA profiles from the same person have
matching values at all loci.
More or fewer loci are more accurate in
identification?
Tradeoffs?
FBI uses 13 core loci
http://www.cstl.nist.gov/biotech/strbase/fbicore.htm
We do not want to wrongly accuse
someone
How can we find out how likely another
person has the same DNA profile?
We do not want to wrongly accuse
someone
How can we find out how likely another
person has the same DNA profile?
How many people are in the world?
We do not want to wrongly accuse
someone
How can we find out how likely another
person has the same DNA profile?
How many people are in the world?
How low the probability needs to be so that a
DNA profile is unique in the world?
We do not want to wrongly accuse
someone
How can we find out how likely another
person has the same DNA profile?
How many people are in the world?
How low the probability needs to be so that a
DNA profile is unique in the world?
Low probability doesn’t mean impossible
Just very unlikely
Review of basic probability
Joint probability of two independent events
P(A,B) = ?
Review of basic probability
Joint probability of two independent events
P(A,B) = P(A) * P(B)
Independent events mean knowing one event
does not provide information about the other
events
P(Die1=1, Die2=1)
= P(Die1=1) * P(Die2=1)
= 1/6 * 1/6 = 1/36.
Enumerating the events
1
1
2
3
1,1
1,2
…
4
2
3
4
5
6
36 events, each is equally likely, so 1/36
5
6
Joint probability
P(Die1=even, Die2=6) = ?
Joint probability
P(Die1=even, Die2=6)
= 1/2 * 1/6 = 1/12
P(Die1=1, Die2=5, Die3=4) = ?
Joint probability
P(Die1=even, Die2=6)
= 1/2 * 1/6 = 1/12
P(Die1=1, Die2=5, Die3=4)
= (1/6)3 = 1/216
DNA profile probability
How to estimate?
DNA profile probability
How to estimate?
Assuming loci are independent
P(Locus1=value1, Locus2=value2, ...)
= P(Locus1=value1) * P(Locus2=value2) * ...
DNA profile probability
How to estimate?
Assuming loci are independent
P(Locus1=value1, Locus2=value2, ...)
= P(Locus1=value1) * P(Locus2=value2) * ...
How to estimate P(Locus1=value1)?
DNA profile probability
How to estimate?
Assuming loci are independent
P(Locus1=value1, Locus2=value2, ...)
= P(Locus1=value1) * P(Locus2=value2) * ...
How to estimate P(Locus1=value1)?
a random sample of size N from the
population and
find out how many people out of N have
value1 at Locus1
Database of DNA profiles
Id
A5212
A6921
…
Locus1
Locus2
Locus3
…
Locus13
Problem Formulation
Given
A sample profile (e.g. collected from the crime
scene)
A database of known profiles
Find
The probability of the sample profile if it
matches a known profile in the database
Breaking Down the Problem
Find
The probability of the sample profile if it matches a
known profile in the database
What are the subproblems?
Breaking Down the Problem
Find
The probability of the sample profile if it matches a
known profile in the database
What are the subproblems?
Subproblem 1
Find whether the sample profile matches
1a: ?
1b: ?
Subproblem 2
Calculate the probability of the profile
Breaking Down the Problem
Find
The probability of the sample profile if it matches a
known profile in the database
What are the subproblems?
Subproblem 1
Find whether the sample profile matches
1a: check entries in the database
1b: check loci in each entry
Subproblem 2
Calculate the probability of the profile
Simpler Problem for 1a (very
common)
Given
an array of integers (e.g. student IDs)
an integer (e.g. an ID)
Find
whether the integer is in the array
int[]
int
(e.g. whether you can enter your dorm)
directory; // student id’s
id;
// to be found
Linear Search
Linear/Sequential Search
Check one by one
Stop if you find it
Stop if you run out of items to check
Not found
Number of Checks (speed of
algorithm)
Consider N items in the array
Best-case scenario
When does it occur? How many checks?
Number of Checks (speed of
algorithm)
Consider N items in the array
Best-case scenario
When does it occur? How many checks?
First item;1 check
Worst-case scenario
When does it occur? How many checks?
Number of Checks (speed of
algorithm)
Consider N items in the array
Best-case scenario
Worst-case scenario
When does it occur? How many checks?
First item;1 check
When does it occur? How many checks?
Last item or not there; N checks
Average-case scenario
Average of all cases
(1 + 2 + … + N) / N = [N(N+1)/2] / N = (N+1)/2
Matching DNA profiles
Each profile has 13 loci
Do we always need to check all 13 loci to
decide if a match occurs or not?
Can we do better? Faster
algorithm?
What if the array is sorted, items are in an
order
E.g. a phone book
Binary Search
Binary Search
1. Check the item at midpoint
2. If found, done
3. Otherwise, eliminate half and repeat 1 and 2
Breaking down the problem
While more items and not found in the mid
point
What are the two subproblems?
Breaking down the problem
While more items and not found in the mid
point
Eliminate half of the items
Determine the mid point
Number of checks (Speed of
algorithm)
Best-case scenario
When does it occur? How many checks?
Number of checks (Speed of
algorithm)
Best-case scenario
When does it occur? How many checks?
In the middle; 1 check
Number of checks (Speed of
algorithm)
Best-case scenario
When does it occur? How many checks?
In the middle; 1 check
Worst-case scenario
When does it occur? How many checks?
Number of checks (Speed of
algorithm)
Best-case scenario
When does it occur? How many checks?
In the middle; 1 check
Worst-case scenario
When does it occur? How many checks?
Dividing into two halves, half has only one
item
? checks
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= … any pattern?
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= …
= T(N/2k) + k
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= …
= T(N/2k) + k
N/2k gets smaller and eventually becomes 1
Number of checks (Speed of
algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= …
= T(N/2k) + k
N/2k gets smaller and eventually becomes 1
solve for k
Number of Checks (Speed of
Algorithm)
N/2k = 1
N = 2k
k=?
Number of Checks (Speed of
Algorithm)
N/2k = 1
N = 2k
k = log2N
Number of Checks (Speed of
Algorithm)
N/2k = 1
N = 2k
k = log2N
T(N) = T(N/2k) + k
= T(1) + log2N
= ? + log2N
Number of Checks (Speed of
Algorithm)
N/2k = 1
N = 2k
k = log2N
T(N) = T(N/2k) + k
= T(1) + log2N
= 1 + log2N
N (Linear search) vs
log N + 1 (Binary search)
N
𝒍𝒐𝒈𝟐 𝑵 + 𝟏
100
7.6
1,000
11.0
10,000
14.3
100,000
17.6
1,000,000
20.9
10,000,000
24.3
100,000,000
27.6
Before using Binary Search
The array needs to be sorted (in order)
Sorting
Sorting (arranging the items in a
desired order)
How is the phone book arranged?
Why?
Why not arranged by numbers?
Sorting (arranging the items in a
desired order)
How is the phone book arranged?
Why?
Why not arranged by numbers?
Order
Alphabetical
Low to high numbers
DNA profile with 13 loci?
Sorting
Imagine you have a thousand numbers in an
array
How would you systemically sort them?
Selection Sort (ascending)
Find/select the smallest item
Swap the smallest item with the first item
Selection Sort (ascending)
Find/select the smallest item
Swap the smallest item with the first item
Find/select the second smallest item
Swap the second smallest item with the
second item
…
Example
6
7
2
5
1
Example
6
7
2
5
1
Example
6
7
2
5
1
1
7
2
5
6
Example
6
7
2
5
1
1
7
2
5
6
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
1
2
5
7
6
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
1
2
5
7
6
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
1
2
5
7
6
1
2
5
6
7
Breaking down the problem
Get all the items in ascending order
Get one item at the wanted position/index
What are the two subproblems?
Breaking down the problem
Get all the items in ascending order
Get one item at the wanted position/index
Find the smallest item
Breaking down the problem
Get all the items in ascending order
Get one item at the wanted position/index
1.
2.
Find the smallest item
Swap the smallest item with the item at the
wanted position
Algorithm Summary (Selection
Sort)
For each “desired” position
Between the “desired” position and the end
Find the smallest item
Swap the smallest item with the item at the
“desired” position
Number of comparisons (Speed of
Algorithm)
Consider counting
Number of comparisons between array items
Number of comparisons (Speed of
Algorithm)
Consider counting
Number of comparisons between array items
Best-case scenario (least # of comparisons)
When does it occur? How many
comparisons?
Number of comparisons (Speed of
Algorithm)
Consider counting
Number of comparisons between array items
Best-case scenario (least # of comparisons)
When does it occur? How many
comparisons?
Worst-case scenario (most # of comparisons)
When does it occur? How many
comparisons?
Number of comparisons (Speed of
Algorithm)
Consider counting
Number of comparisons between array items
Best-case scenario (least # of comparisons)
When does it occur? How many
comparisons?
Worst-case scenario (most # of comparisons)
When does it occur? How many
comparisons?
Same number of comparisons
For all cases (ie best case = worst case)
Number of comparisons (Speed of
Algorithm)
To find the smallest item
How many comparisons?
Number of comparisons (Speed of
Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
Number of comparisons (Speed of
Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
N-2
…
Total # of comparisons?
Number of comparisons (Speed of
Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
N-2
…
Total # of comparisons
(N-1) + (N-2) + … + 1
Number of comparisons (Speed of
Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
N-2
…
Total # of comparisons
(N-1) + (N-2) + … + 1
N(N-1)/2 = (N2 – N)/2
Selection Sort
Not the fastest sorting algorithm
Learn faster algorithms in more advanced
courses.
Revisiting Binary Search
Binary Search
While more items and not found in the mid
point
1.
2.
Eliminate half of the items
Determine the mid point
Eliminate half of the array
How to specify the focus region?
Hint: index/position
Eliminate half of the array
How to specify the focus region?
Hint: index/position
Start and end
How to determine if the region has
items (is not empty)?
with start and end
How to determine if the region has
items (is not empty)?
with start and end
Start <= end
How do we adjust start and end?
How do we adjust start and end?
What are the two different cases?
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Item is after the middle item
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start:
End:
Item is after the middle item
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start: no change
End: position before the mid point
Item is after the middle item
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start: no change
End: position before the mid point
Item is after the middle item
Start:
End:
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start: no change
End: position before the mid point
Item is after the middle item
Start: position after the mid point
End: no change
How to determine the mid point?
with start and end?
How to determine the mid point?
with start and end
(start + end) / 2
Integer division will eliminate the fractional part
Algorithm Summary
1. Initialize start, end, and mid point (I)
2. While region has items and item is not at the
mid point ( C )
a)
b)
Eliminate half of the items by adjusting start
or end (U)
Update the mid point (U)
3. If region has items
Position is mid point
else
Position is -1
Overall Summary
Overall Summary
DNA samples from crime scene
Identify people using known DNA profiles
If there is a match
estimate probability of DNA profile
Matching a sample to known DNA profiles
Linear/sequential search [N checks]
Binary search [log2N + 1 checks]
Faster but needs sorted data/profiles
Selection Sort [(N2 – N)/2 comparisons]