48x48 Poster - NDSU Computer Science

Download Report

Transcript 48x48 Poster - NDSU Computer Science

Predicate-Tree based Pretty Good Protection of Data
William Perrizo, Arjun G. Roy
Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA.
Introduction
Vertical Slices
PGP-D Operation
To retrieve P-Tree information, we require
Growth of Internet has caused vast amount of data generation
* ordering – mapping of bit position to table row
Security of data is of prime concern
* predicate – table column id and bit slice or bit map
Vertical Data processing has been in focus recently because of better
performance than Horizontal Data processing in Data Mining Applications
* location
Key of the data is an array of two tuples storing location and the pad
Predicate Trees patented by NDSU is an effective Data Mining
Technology with use in Spatial Association Rule Mining, Text Clustering,
etc.
Typical key could be [{5, 54}, {7, 539}, {87, 3}, {209, 126}, {25, 896}, {888, 23}, …]
We make all P-Trees of same length and pad in the front to hide the start position
We propose a Predicate Tree based solution for Data security
R[A1]
We also scramble location of the P-Trees
For basic P-Trees, key K would reveal the offset and the pre-pad
In the above example, first P-Tree is at offset 5, i.e. it has been shuffled 5 P-Tree
slots and real information starts after 54 bits
Predicate Trees
Predicate Trees or pTrees are:-
Discussion
* Compressed
* Data-mining-ready
Since P-Trees are data-mining-ready data structures, we are not in favor of
expensive encryption and decryption operation
* Vertical data structure
More the number of P-Trees, better the protection
Predicate Tree - Operation
Predicate Trees – Demonstration
Most frequently used operations in P-Trees are AND, OR and NOT
Operations are carried on a level by level basis starting from root node
We are able to compute following important operations vertically
2
7
6
1
3
7
6
0
* Sum / Mean
* Median / Rank-k
2
7
5
1
* Square Sum / Variance
* Maximum / Minimum
2
7
5
7
5
2
1
4
* Top-k values
* Mode
2
2
1
5
7
0
1
4
7
0
1
4
* Addition, Subtraction, Multiplication of P-TreeSet by constant value
* Comparison of P-TreeSets
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
111
110
001
011
111
110
000
010
111
101
001
010
111
101
111
101
010
001
100
010
010
001
101
111
000
001
100
111
000
001
100
Pretty Good Privacy of Data
Mechanism to scramble P-Tree information without compromising on speed
Scrambled data would be unrevealing of the raw data except for the person
issuing data mining request
In distributed database scenario, it would make sense to fully replicate the P-Trees to
allow local retrievals
A case could arise where hacker extracts first bit of every P-Tree and shuffle those
bits until something means appears or starts to appear
To get around this possibility, we store entire database as a “Big Bit String” and have
it as a part of the key
References and Acknowledgement
Ding Q., Ding Q., Perrizo W.: PARM - An Efficient Algorithm to Mine Association
Rules from Spatial Data. IEEE Transactions on Systems, Man, and Cybernetics,
Part B 38(6) 1513-1524 (2008)
PGP-D
010
For a database with 5000 tables with 50 columns each and each column represented
by 32 bits, we would have 8 million P-Trees
Rahal I., Perrizo W.: An optimized approach for KNN text categorization using
P-trees. ACM Symposium on Applied Computing 613-617 (2004)
Perrizo W: Predicate Count Tree Technology. Technical Report NDSU-CSOR-TR01-1 (2001)
Wang Y., Lu T., Perrizo W.: A Novel Combinatorial Score for Feature Selection
with P-Tree in DNA Microarray Data Analysis. 19th International Conference on
Software Engineering and Data Engineering 295-300 (2010)