Transcript slides

Data Warehousing
Data Mining
Privacy
Reading


Farkas
Bhavani Thuraisingham, Murat Kantarcioglu, and
Srinivasan Iyer. 2007. Extended RBAC-design and
implementation for a secure data warehouse. Int. J.
Bus. Intell. Data Min. 2, 4 (December 2007), 367382.,
https://www.utdallas.edu/~bxt043000/Publications/
Technical-Reports/UTDCS-35-07.pdf
Sweeney L, Abu A, and Winn J. Identifying
Participants in the Personal Genome Project by Name.
Harvard University. Data Privacy Lab. White Paper
1021-1. April 24, 2013.
http://dataprivacylab.org/projects/pgp/1021-1.pdf
CSCE 824 - Spring 2015
2
Data Warehousing

Repository of data providing
organized and cleaned enterprisewide data (obtained form a
variety of sources) in a
standardized format
– Data mart (single subject area)
– Enterprise data warehouse (integrated
data marts)
– Metadata
Farkas
CSCE 824 - Spring 2015
3
OLAP Analysis




Farkas
Aggregation functions
Factual data access
Complex criteria
Visualization
CSCE 824 - Spring 2015
4
Warehouse Evaluation





Farkas
Enterprise-wide support
Consistency and integration
across diverse domain
Security support
Support for operational users
Flexible access for decision
makers
CSCE 824 - Spring 2015
5
Data Integration




Farkas
Data access
Data federation
Change capture
Need ETL (extraction,
transformation, load)
CSCE 824 - Spring 2015
6
Data Warehouse Users

Internal users
– Employees
– Managerial

External users
– Reporting and auditing
– Research
Farkas
CSCE 824 - Spring 2015
7
Data Mining




Farkas
Databases to be mined
Knowledge to be mined
Techniques Used
Applications supported
CSCE 824 - Spring 2015
8
Data Mining Task



Farkas
DM: mostly automated
Prediction Tasks
– Use some variables to predict
unknown or future values of other
variables
Description Tasks
– Find human-interpretable patterns
that describe the data
CSCE 824 - Spring 2015
9
Common Tasks





Farkas
Classification [Predictive]
Clustering [Descriptive]
Association Rule Mining [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
CSCE 824 - Spring 2015
10
Security for Data
Warehousing




Farkas
Establish organizations security
policies and procedures
Implement logical access control
Restrict physical access
Establish internal control and
auditing
CSCE 824 - Spring 2015
11
Data Warehousing
Issues: Integrity



Poor quality data: inaccurate,
incomplete, missing meta-data
Loss of traditional consistency,
e.g., keys
Source data quality vs. derived
data quality
– Trust in the result of analysis?
Farkas
CSCE 824 - Spring 2015
12
Big Data Security and
Privacy



Amount of data being considered
Privacy-preserving analytics
Granular Access Control
– Flat, two dimensional tables


Farkas
Transaction logs and auditing
Real time monitoring
CSCE 824 - Spring 2015
13
Big Data Integrity



Farkas
Data Accuracy
Source provenance
End-point filtering and validation
CSCE 824 - Spring 2015
14
Access Control

Layered defense:
– Access to processes that extract
operational data
– Access to data and process that
transforms operational data
– Access to data and meta-data in the
warehouse
Farkas
CSCE 824 - Spring 2015
15
Access Control Issues




Farkas
Mapping from local to warehouse
policies
How to handle “new” data
Scalability
Identity Management
CSCE 824 - Spring 2015
16
Inference Problem



Data Mining: discover “new knowledge”  how to
evaluate security risks?
Example security risks:
– Prediction of sensitive information
– Misuse of information
Assurance of “discovery”
Farkas
CSCE 824 - Spring 2015
17
Privacy and Sensitivity


Farkas
Large volume of private (personal) data
Need:
– Proper acquisition, maintenance,
usage, and retention policy
– Integrity verification
– Control of analysis methods
(aggregation may reveal sensitive
data)
CSCE 824 - Spring 2015
18
Privacy



Farkas
What is the difference between
confidentiality and privacy?
Identity, location, activity, etc.
Anonymity vs. accountability
CSCE 824 - Spring 2015
19
Legislations




Privacy Act of 1974, U.S. Department of Justice
(http://www.usdoj.gov/oip/04_7_1.html )
Family Educational Rights and Privacy Act (FERPA),
U.S. Department of Education,
(http://www.ed.gov/policy/gen/guid/fpco/ferpa/in
dex.html )
Health Insurance Portability and Accountability Act
of 1996 (HIPAA),
(http://en.wikipedia.org/wiki/Health_Insurance_Por
tability_and_Accountability_Act )
Telecommunications Consumer Privacy Act
(http://www.answers.com/topic/electroniccommunications-privacy-act )
Farkas
CSCE 824 - Spring 2015
20
Online Social Network

Social Relationship
 Communication context changes
social relationships
 Social relationships maintained
through different media grow at
different rates and to different
depths
 No clear consensus which media is
the best
Farkas
CSCE 824 - Spring 2015
21
Internet and Social
Relationships
Internet
 Bridges distance at a low cost
 New participants tend to “like” each
other more
 Less stressful than face-to-face
meeting
 People focus on communicating
their “selves” (except a few
malicious users)
Farkas
CSCE 824 - Spring 2015
22
Social Network

Description of the social structure
between actors

Connections: various levels of social
familiarities, e.g., from casual
acquaintance to close familiar bonds

Support online interaction and
content sharing
Farkas
CSCE 824 - Spring 2015
23
Social Network Analysis



The mapping and measuring of
relationships and flows between
people, groups, organizations,
computers or other information
processing entities
Behavioral Profiling
Note: Social Network Signatures
– User names may change, family and
friends are more difficult to change
Farkas
CSCE 824 - Spring 2015
24
Interesting Read:

Farkas
M. Chew, D. Balfanz, B. Laurie,
(Under)mining Privacy in Social
Networks,
http://citeseer.ist.psu.edu/viewd
oc/summary?doi=10.1.1.149.446
8
CSCE 824 - Spring 2015
25
Next

Farkas
Web application insecurity: risk
to databases
CSCE 824 - Spring 2015
26