Identification

Download Report

Transcript Identification

Identification Numbers
and Error Detection
Meredith Wachs
Where do you find identification
numbers?








Checks
Credit cards
Driver’s licenses
VIN Numbers
Zip Codes
SSN
ISBN Numbers
Bar Codes/UPC’s (Universal Product Codes)
Check Digits
Used to reduce error
 “Grocery items, credit cards, overnight
mail, magazines, personal checks,
traveler’s checks, soft drink cans, and
automobiles” have check digits (COMAP)

Congruence mod n
“a is congruent to b, mod n, if n divides
a-b” (Stillwell, Elements of Number Theory)
 For example, 27 is congruent to 6 mod 3
because 27-6=21, which is divisible by three.
 Similarly, a number is congruent (mod n) to its
remainder after being divided by n; for example,
20/3=6 R2, so 20=2 (mod 3)

Money Order




COMAP example- The identification number on
a money order is 63024383845.
The last digit is the check digit, so the first 10
digits should be 5 mod 9.
How do we divide such a large number by 9?
How do we know this?
Calculating Divisibility Rules
Take a number z=abcdefg
 This could also be written as z = g*10^0 +
f*10^1 + e*10^2 + d*10^3 + c*10^4 +
b*10^5 + a*10^6
 Let’s replace each 10^n by its congruence
mod 9

Divisibility by 9 continued
10^0=1=1 (mod 9)
 10^1=10=1 (mod 9)
 10^2=1*1=1 (mod 9)
 By analogy, we see that every place will
have a weight of 1, so
a(1)+b(1)+c(1)+d(1)+e(1)+f(1)+g(1) leaves
the same remainder (mod 9) as abcdefg,
so this is our divisibility rule.

Divisibility by 11







We’ll proceed in the same way:
10^0=1 mod 11
10^1=-1 mod 11
10^2= -1*-1=1 mod 11
10^3= 10^2*10^1=1*-1=-1 mod 11
In this way, we see that abcdefg=(g+e+c+a)(f+d+b) (mod 11)
Ex. 3458679=(9+6+5+3)-(7+8+4)=4 (mod 11),
so is not divisible by 11.
Money Order Revisited

If the check digit only checks the divisibility
of the sum, what can be overlooked?
Potential Problems with the Check
Digit
Any number can be replaced with its
congruence mod n (so 0 can be replaced
with 9 in this example)
 Also, any digits can be easily transposed.

Other Uses of the Check Digit


Traveler’s checks and Euro banknotes use the
check digit, but the entire number, including the
check digit, should be divisible by 9.
If you know the divisor rule, you can easily figure
out what the check digit should be. For
example, if a traveler’s check ID number is
3487956321 and it has a divisor rule of 9, what
should the check digit be?
Answer

The sum of the digits is 48, and 48+6=54,
which is divisible by 9, so the check digit
should be 6.
More Sophisticated- UPC’s
A 12-digit number that is found along the
bottom of a barcode
 A BBBBB CCCCC D
 “A” is the type of good, “B” is the
manufacturer’s code, “C” is the product
code, “D” is the check digit (COMAP)

UPC Error Prevention
Take the code A BCDEF GHIJK L. Going
from left to right, calculate
3A+B+3C+D+3E+F+3G+H+3I+J+3K+L. If
it isn’t divisible by 10, it is an incorrect
UPC.
 This “detects all single-position errors and
about 89% of other errors” (COMAP 326).
 Try this on your own!

The U.S. Banking System

To identify a certain bank, it has an
identification number (the first string on the
bottom of the check). The string is
ABCDEFGH, and the check digit is I. “I”
must be the last digit of the resulting sum
of 7A+3B+9C+7D+3E+9F+7G+3H
http://blog.wellsfargo.com/GuidedByHistory/images/Earth_Check_large.jpg
Here, we can see that
7(1)+3(2)+9(1)+0+0+0+7(2)+3(4)= 48. Since 8 is
the check digit, this is a valid number.
Credit Card Numbers



A more effective algorithm, which is used for credit
cards, is called Codabar, and is also used by “libraries,
blood banks, photofinishing companies, German banks,
and the South Dakota driver’s license department”
(COMAP 327).
Codabar “allows computers to detect 100% of singleposition errors and about 98% of other common errors”
(COMAP 328).
The Codabar algorithm was developed by Hans Peter
Luhn (1896-1964) and was patented in 1960
(http://www.merriampark.com/anatomycc.htm).
The Algorithm





Assume a 16-digit card number, with the final number
being the check digit. Add every digit in the oddnumbered spaces (going from left to right) and multiply
by two.
Then add the number of digits in odd-numbered spaces
that >4.
Finally, add all the digits in the even-numbered spaces
(except for the check digit).
The check digit will need to be whatever will make the
total divisible by 10.
Ex. What is the check digit for card number
124785943967210?
Solution
 Ex.
What is the check digit for card
number 124785943967210?
 1+4+8+9+3+6+2+0=33*2=66
 66+3=69
 69+2+7+5+4+9+7+1=
104
 104+6=110, so the check digit is
6.
ISBN Numbers
The 10-digit International Standard Book
Number detects “100% of single errors
and 100% of transposition errors”
(COMAP 328)
 An ISBN of A-BCDE-FGHI-J, with J as the
check digit, is valid if
10A+9B+8C+7D+6E+5F+4G+3H+2I+J is
divisible by 11.

ISBN’s continued
Example: 0-387-95587-9
 10(0)+9(3)+8(8)+7(7)+6(9)+5(5)+4(5)+3(8)
+2(7)+9=286
 286 is divisible by 11 because (6+2)-8=0,
which is divisible by 11.

Why does this work every time?
COMAP proof:
 Let there be an error in the B slot called B’.
 Then both calculations (with and without
errors) must be divisible by 11 in order for
B’ to not be detected. Then the difference
between the calculations must be divisible
by 11.

Proof continued…
So
(10A+9B+8C+7D+6E+5F+4G+3H+2I+J) –
(10A+9B’+8C+7D+6E+5F+4G+3H+2I+J)=
9(B-B’).
 Since B,B’ <=9, B-B’ cannot be 11, and so
9(B-B’) cannot be divisible by 11 unless
B=B’.

Another Example

What is the check digit for the ISBN 07167-1910?
Solution
10(0)+9(7)+8(1)+7(6)+6(7)+5(1)+4(9)+3(1)
+2(0)= 199
 The next number divisible by 11 is
199+10=209.
 To represent 10, ISBN numbers have an
X.

Code 39




This uses the digits 0-9 and letters A-Z (which
correspond to 10-35)
Code 39 is used by the DoD, automotive
companies, and the health industry (COMAP
329).
A 15-character string is validated by whether
15a+14b+13c+….1*o is divisible by 36 (with o as
the check digit).
The VIN system is a more complicated
alphanumeric system.
Bar Codes

“To decode the information in a bar code, a
beam of light is passed over the bars and
spaces via a scanning device, such as a
handheld wand of a fixed-beam device. The
dark bars reflect very little light back to the
scanner, whereas the light spaces reflect much
light. The differences in reflection intensities are
detected by the scanner and converted to strings
of 0’s and 1’s that represent specific numbers
and letters.” (COMAP 334)
Postnet Codes
A bar code for a ZIP+4 code (+1 check
digit)
 There are 52 long or short bars, with one
“guard bar” on either side and the
remaining 50 bars grouped into 10 groups
of 5, with 2 long bars and 3 short bars
each.
 The check digit makes the sum of all ten
numbers divisible by 10.

http://www-math.cudenver.edu/~wcherowi/jcorner/barcodes.html
UPC Bar Codes



Each digit is made up of seven modules
There are guard bars, a center division, and a
difference between manufacturer and product
numbers to make reading the codes as accurate
as possible.
Bar codes for UPC’s have been in use since a
pack of Wrigley Juicy Fruit gum was scanned on
June 26, 1974 in Marsh’s Supermarket in Troy,
Ohio. The first barcodes were made by National
Cash Register, but smearing ink problems soon
made IBM the top contender in the market.
(http://en.wikipedia.org/wiki/Barcode)
Digit
Manufacturer
Product
0
0001101
1110010
1
0011001
1100110
2
0010011
1101100
3
0111101
1000010
4
0100011
1011100
5
0110001
1001110
6
0101111
1010000
7
0111011
1000100
8
0110111
1001000
9
0001011
1110100
http://en.wikipedia.org/wiki/Universal_Product_Code
Is the UPC above valid? (3*0+1*3+3*6+…3*5+1*2 =
60 = 0 mod 10, so yes, it is valid.)
Illinois Driver’s License Numbers


In contrast to Social Security numbers, an Illinois
driver’s license number can help reconstruct a
person’s surname (by sound, not by spelling),
first and middle initials, date of birth, and gender.
These forms of ID numbers are also used in the
National Archives, the Library of Congress, and
in genealogy research. (COMAP 341-2).
Soundex Coding System for
Surnames
1.
2.
3.
Delete all occurrences of h and w. (so
“Wachs” becomes “acs”)
Assign number as follows: a,e,i,o,u,y = 0;
b,f,p,v = 1; c,g,j,k,q,s,x,z = 2; d,t = 3; l =
4; m,n = 5; r = 6 (so “acs” = 022)
If two or more letters with the same
numeric value are adjacent, omit all but
the first (so we’re left with “ac”)
Soundex continued…
4. Delete the first character of the original name if still
present.
5. Delete all occurrences of a,e,i,o,u,y. (so we’re left with
“c”)
6. Retain only the first three digits corresponding to the
remaining letters; append trailing 0’s if fewer than three
letters remain; precede the three digits with the first letter
of the surname (so we have W200)
 Because of the way this is coded, many errors in spelling
are taken into account.
*taken directly from COMAP 342
The Middle Digits-First Initial

InitialCodeInitialCodeInitialCodeInitialCode
A
B
C
D
E
F
G

0
60
100
160
200
240
280
H
I
J
K
L
M
N
320
400
420
500
520
540
620
O
P
Q
R
S
T
U
640
660
700
720
780
800
840
V
W
X
Y
Z
860
880
940
960
980
The Middle Digits-Middle Initial

InitialCodeInitialCodeInitialCodeInitialCode
A
B
C
D
E
F
G

1
2
3
4
5
6
7
H
I
J
K
L
M
N
8
9
10
11
12
13
14
O
P
Q
R
S
T
U
14 V
15 W
15 X
16 Y
17 Z
18
18
18
19
19
19
19
Calculating the Middle Digits
Add the code for the first initial to the code
for the middle initial. For my initials, MJ,
you have 540+10=550.
 So far, my number is W200-550.
 All middle digits information taken from
http://www.highprogrammer.com/alan/num
bers/dl_us_shared.html

The Last Five Digits
In Illinois, the last five digits retain the birth date and sex
of the person.
 Each month is assumed to have 31 days (starting with
January 1 as 001). My birthday, March 2, is therefore
2*31=62+2=064. If male, you are done. If female, add
600 to this number (so I am 664).
 Put the last two digits of your birth year before this
number. Thus, the last five digits of my Illinois driver’s
license is 8-8664. My complete number is W200-55088664. As the website author points out, IL not having
overflow numbers makes it likely to have multiple people
with the same number printed on their license.
*COMAP 343

Conclusions

We have seen several ways of identifying objects or
people with numbers and the likelihood of error and the
ways error is caught with each one. As COMAP points
out on p. 329, “Like many practices in the ‘real world,’
historical accident and lack of knowledge about existing
methods seem to be the explanation [for having so many
means of identification].” We see this especially in the
difference between driver’s license numbers and SSN,
which were assigned prior to computers and the
development of many of the systems used (like
Soundex).