Transcript slides
Data Warehousing
Data Mining
Privacy
Reading
Farkas
Bhavani Thuraisingham, Murat Kantarcioglu, and
Srinivasan Iyer. 2007. Extended RBAC-design and
implementation for a secure data warehouse. Int. J.
Bus. Intell. Data Min. 2, 4 (December 2007), 367382.,
https://www.utdallas.edu/~bxt043000/Publications/
Technical-Reports/UTDCS-35-07.pdf
Sweeney L, Abu A, and Winn J. Identifying
Participants in the Personal Genome Project by Name.
Harvard University. Data Privacy Lab. White Paper
1021-1. April 24, 2013.
http://dataprivacylab.org/projects/pgp/1021-1.pdf
CSCE 824 - Spring 2015
2
Data Warehousing
Repository of data providing
organized and cleaned enterprisewide data (obtained form a
variety of sources) in a
standardized format
– Data mart (single subject area)
– Enterprise data warehouse (integrated
data marts)
– Metadata
Farkas
CSCE 824 - Spring 2015
3
OLAP Analysis
Farkas
Aggregation functions
Factual data access
Complex criteria
Visualization
CSCE 824 - Spring 2015
4
Warehouse Evaluation
Farkas
Enterprise-wide support
Consistency and integration
across diverse domain
Security support
Support for operational users
Flexible access for decision
makers
CSCE 824 - Spring 2015
5
Data Integration
Farkas
Data access
Data federation
Change capture
Need ETL (extraction,
transformation, load)
CSCE 824 - Spring 2015
6
Data Warehouse Users
Internal users
– Employees
– Managerial
External users
– Reporting and auditing
– Research
Farkas
CSCE 824 - Spring 2015
7
Data Mining
Farkas
Databases to be mined
Knowledge to be mined
Techniques Used
Applications supported
CSCE 824 - Spring 2015
8
Data Mining Task
Farkas
DM: mostly automated
Prediction Tasks
– Use some variables to predict
unknown or future values of other
variables
Description Tasks
– Find human-interpretable patterns
that describe the data
CSCE 824 - Spring 2015
9
Common Tasks
Farkas
Classification [Predictive]
Clustering [Descriptive]
Association Rule Mining [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
CSCE 824 - Spring 2015
10
Security for Data
Warehousing
Farkas
Establish organizations security
policies and procedures
Implement logical access control
Restrict physical access
Establish internal control and
auditing
CSCE 824 - Spring 2015
11
Data Warehousing
Issues: Integrity
Poor quality data: inaccurate,
incomplete, missing meta-data
Loss of traditional consistency,
e.g., keys
Source data quality vs. derived
data quality
– Trust in the result of analysis?
Farkas
CSCE 824 - Spring 2015
12
Big Data Security and
Privacy
Amount of data being considered
Privacy-preserving analytics
Granular Access Control
– Flat, two dimensional tables
Farkas
Transaction logs and auditing
Real time monitoring
CSCE 824 - Spring 2015
13
Big Data Integrity
Farkas
Data Accuracy
Source provenance
End-point filtering and validation
CSCE 824 - Spring 2015
14
Access Control
Layered defense:
– Access to processes that extract
operational data
– Access to data and process that
transforms operational data
– Access to data and meta-data in the
warehouse
Farkas
CSCE 824 - Spring 2015
15
Access Control Issues
Farkas
Mapping from local to warehouse
policies
How to handle “new” data
Scalability
Identity Management
CSCE 824 - Spring 2015
16
Inference Problem
Data Mining: discover “new knowledge” how to
evaluate security risks?
Example security risks:
– Prediction of sensitive information
– Misuse of information
Assurance of “discovery”
Farkas
CSCE 824 - Spring 2015
17
Privacy and Sensitivity
Farkas
Large volume of private (personal) data
Need:
– Proper acquisition, maintenance,
usage, and retention policy
– Integrity verification
– Control of analysis methods
(aggregation may reveal sensitive
data)
CSCE 824 - Spring 2015
18
Privacy
Farkas
What is the difference between
confidentiality and privacy?
Identity, location, activity, etc.
Anonymity vs. accountability
CSCE 824 - Spring 2015
19
Legislations
Privacy Act of 1974, U.S. Department of Justice
(http://www.usdoj.gov/oip/04_7_1.html )
Family Educational Rights and Privacy Act (FERPA),
U.S. Department of Education,
(http://www.ed.gov/policy/gen/guid/fpco/ferpa/in
dex.html )
Health Insurance Portability and Accountability Act
of 1996 (HIPAA),
(http://en.wikipedia.org/wiki/Health_Insurance_Por
tability_and_Accountability_Act )
Telecommunications Consumer Privacy Act
(http://www.answers.com/topic/electroniccommunications-privacy-act )
Farkas
CSCE 824 - Spring 2015
20
Online Social Network
Social Relationship
Communication context changes
social relationships
Social relationships maintained
through different media grow at
different rates and to different
depths
No clear consensus which media is
the best
Farkas
CSCE 824 - Spring 2015
21
Internet and Social
Relationships
Internet
Bridges distance at a low cost
New participants tend to “like” each
other more
Less stressful than face-to-face
meeting
People focus on communicating
their “selves” (except a few
malicious users)
Farkas
CSCE 824 - Spring 2015
22
Social Network
Description of the social structure
between actors
Connections: various levels of social
familiarities, e.g., from casual
acquaintance to close familiar bonds
Support online interaction and
content sharing
Farkas
CSCE 824 - Spring 2015
23
Social Network Analysis
The mapping and measuring of
relationships and flows between
people, groups, organizations,
computers or other information
processing entities
Behavioral Profiling
Note: Social Network Signatures
– User names may change, family and
friends are more difficult to change
Farkas
CSCE 824 - Spring 2015
24
Interesting Read:
Farkas
M. Chew, D. Balfanz, B. Laurie,
(Under)mining Privacy in Social
Networks,
http://citeseer.ist.psu.edu/viewd
oc/summary?doi=10.1.1.149.446
8
CSCE 824 - Spring 2015
25
Next
Farkas
Web application insecurity: risk
to databases
CSCE 824 - Spring 2015
26