Lecture 2 - The University of Texas at Dallas

Download Report

Transcript Lecture 2 - The University of Texas at Dallas

Building Trustworthy
Semantic Webs
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Lecture #3
Supporting Technologies: Databases, Information
Management and Information Security
August 2006
Database System
 Consists of database, hardware, Database Management System
(DBMS), and users
 Database is the repository for persistent data
 Hardware consists of secondary storage volumes, processors, and
main memory
 DBMS handles all users’ access to the database
 Users include application programmers, end users, and the
Database Administrator (DBA)
 Need: Reduced redundancy, avoids inconsistency, ability to share
data, enforce standards, apply security restrictions, maintain
integrity, balance conflicting requirements
 We have used the definition of a database management system
given in C. J. Date’s Book (Addison Wesley, 1990)
An Example Database System
Application
Programs
Database Management System
Database
Adapted from C. J. Date, Addison Wesley, 1990
Users
Metadata
 Metadata describes the data in the database
- Example:
Database D consists of a relation EMP with
attributes SS#, Name, and Salary
 Metadatabase stores the metadata
- Could be physically stored with the database
 Metadatabase may also store constraints and administrative
information
 Metadata is also referred to as the schema or data dictionary
Functional Architecture
Data Management
User Interface Manager
Schema
(Data Dictionary)
Manager
(metadata)
Query
Manager
Security/
Integrity
Manager
Transaction Manager
Storage Management
File
Manager
Disk
Manager
DBMS Design Issues
 Query Processing
- Optimization techniques
 Transaction Management
- Techniques for concurrency control and recovery
 Metadata Management
- Techniques for querying and updating the metadatabase
 Security/Integrity Maintenance
- Techniques for processing integrity constraints and enforcing
access control rules
 Storage management
- Access methods and index strategies for efficient access to the
database
Relational Database: Example
Relation S:
S#
S1
S2
S3
S4
S5
SNAME
Smith
Jones
Blake
Clark
Adams
Relation SP:
STATUS CITY
20
London
10
Paris
30
Paris
20
London
30
Athens
Relation P:
P#
P1
P2
P3
P4
P5
P6
PNAME
Nut
Bolt
Screw
Screw
Cam
Cog
COLOR WEIGHT CITY
Red
12
London
Green
17
Paris
Blue
17
Rome
Red
14
London
Blue
12
Paris
Red
19
London
S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
QTY
300
200
400
200
100
100
300
400
200
200
300
400
Concepts in Object Database Systems
 Objects- every entity is an object
- Example: Book, Film, Employee, Car
 Class
- Objects with common attributes are grouped into a class
 Attributes or Instance Variables
- Properties of an object class inherited by the object instances
 Class Hierarchy
- Parent-Child class hierarchy
 Composite objects
- Book object with paragraphs, sections etc.
 Methods
- Functions associated with a class
A Definition of a Distributed Database System
 A collection of database systems connected via a network
 The software that is responsible for interconnection is a Distributed
Database Management System (DDBMS)
 Each DBMS executes local applications and should be involved in at
least one global application (Ceri and Pelagetti)
 Homogeneous environment
Architecture
Database 1
Database 3
DBMS 3
Distributed
Processor 3
Site 3
DBMS 1
Distributed
Processor 1
Communication Network
Site 1
Database 2
Distributed
Processor 2
DBMS 2
Site 2
Data Distribution
SITE 1
EMP1
DEPT1
SS#
Name
Salary
D#
D#
Dname
MGR
1
2
3
4
5
6
John
Paul
James
Jill
Mary
Jane
20
30
40
50
60
70
10
20
20
20
10
20
10
C. Sci.
Jane
30
English
David
40
French
Peter
D#
DEPT2
Dname
MGR
50
Math
John
20
Physics
Paul
SITE 2
EMP2
SS#
9
Name
Mathew
Salary
70
D#
50
7
David
80
30
8
Peter
90
40
Interoperability of Heterogeneous Database
Systems
Database System A
Database System B
(Relational)
(ObjectOriented)
Network
Transparent access
to heterogeneous
databases both users
and application
programs;
Query, Transaction
processing
Database System C
(Legacy)
Federated Database Management
Database System A
Database System B
Federation
F1
Cooperating database
systems yet maintaining
some degree of
autonomy
Federation
F2
Database System C
Federated Data and Policy Management
Data/Policy for Federation
Export
Data/Policy
Export
Data/Policy
Export
Data/Policy
Component
Data/Policy for
Agency A
Component
Data/Policy for
Agency C
Component
Data/Policy for
Agency B
Current Status and Directions
 Developments
- Several prototypes and some commercial products
- Tools for schema integration and transformation
- Standards for interoperable database systems
 Challenges being addressed
- Semantic heterogeneity
- Autonomy and federation
- Global transaction management
- Integrity and Security
 New challenges
- Scale
- Web data management
What is Information Management?
 Information management essentially analyzes the data and makes
sense out of the data
 Several technologies have to work together for effective information
management
- Data Warehousing: Extracting relevant data and putting this data
into a repository for analysis
- Data Mining: Extracting information from the data previously
unknown
- Multimedia: managing different media including text, images,
video and audio
- Web: managing the databases and libraries on the web
Data Warehouse
Users
Query
the Warehouse
Oracle
DBMS for
Employees
Data Warehouse:
Data correlating
Employees With
Medical Benefits
and Projects
Sybase
DBMS for
Projects
Could be
any DBMS;
Usually based on
the relational
data model
Informix
DBMS for
Medical
Multidimensional Data Model
Project Name
Project Leader
Project Sponsor
Years
Project Cost
Months
Project Duration
Weeks
Dollars
Pounds
Yen
Data Mining
Information Harvesting
Knowledge Mining
Data Mining
Knowledge Discovery
in Databases
Data Dredging
Data Archaeology
Data Pattern Processing
Database Mining
Knowledge Extraction
Siftware
The process of discovering meaningful new correlations, patterns, and trends by
sifting through large amounts of data, often previously unknown, using pattern
recognition technologies and statistical and mathematical techniques
(Thuraisingham 1998)
Multimedia Information Management
Video
Source
Broadcast News Editor (BNE)
Scene
Change
Detection
Frame
Classifier
Imagery
Silence
Detection
Correlation
Story
GIST Theme
Broadcast
Detection
Commercial
Detection
Key Frame
Selection
Story
Segmentation
Audio
Closed
Caption
Text
Speaker
Change
Detection
Closed
Caption
Preprocess
Segregate
Video
Streams
Broadcast News
Navigator (BNN)
Token
Detection
Named
Entity
Tagging
Analyze and Store Video and Metadata
Multimedia
Database
Management
System
Video
and
Metadata
Web-based Search/Browse by
Program, Person, Location, ...
Extracting Relations from Text for Mining:
An Example
Text
Corpus
Concept
Extraction
Goal: Find
Cooperating/
Combating Leaders
in a territory
Association
Rule
Product
Repository
Person1
Natalie Allen
Leon Harris
Ron Goldman
Mobotu Sese
Seko
Person2
Linden Soles
Joie Chen
Nicole Simpson
...
Laurent Kabila
117
53
19
10
Image Processing:
Example: Change Detection:
 Trained Neural Network to predict “new” pixel from “old” pixel
- Neural Networks good for multidimensional continuous data
- Multiple nets gives range of “expected values”
 Identified pixels where actual value substantially outside range of
expected values
- Anomaly if three or more bands (of seven) out of range
 Identified groups of anomalous pixels
Semantic Web
0Adapted from Tim Berners Lee’s description of the Semantic Web
T
R
U
S
T
P
R
I
V
A
C
Y
Logic, Proof and Trust
Rules/Query
RDF, Ontologies
Other
Services
XML, XML Schemas
URI, UNICODE
0 Some Challenges: Interoperability between Layers; Security and
Privacy cut across all layers; Integration of Services; Composability
Semantic Web Technologies
 Web Database/Information Management
- Information retrieval and Digital Libraries
 XML, RDF and Ontologies
- Representation information
 Information Interoperability
- Integrating heterogeneous data and information sources
 Intelligent agents
- Agents for locating resources, managing resources, querying
resources and understanding web pages
 Semantic Grids
- Integrating semantic web with grid computing technologies
Information Management for Collaboration
Team A
Team B
Teams A and B
Collaborating
on a geographical
problem
Some Emerging Information Management
Technologies
 Visualization
- Visualization tools enable the user to better understand the
information
 Peer-to-Peer Information Management
- Peers communicate with each other, share resources and carry
out tasks
 Sensor and Wireless Information Management
- Autonomous sensors cooperating with one another, gathering
data, fusing data and analyzing the data
- Integrating wireless technologies with semantic web
technologies
What is Knowledge Management?
 Knowledge management, or KM, is the process through which
organizations generate value from their intellectual property and
knowledge-based assets
 KM involves the creation, dissemination, and utilization of
knowledge
 Reference: http://www.commerce-database.com/knowledge-
management.htm?source=google
Knowledge Management Components
Knowledge
Components of
Management:
Components,
Cycle and
Technologies
Components:
Strategies
Processes
Metrics
Cycle:
Knowledge, Creation
Sharing, Measurement
And Improvement
Technologies:
Expert systems
Collaboration
Training
Web
Organizational Learning Process
Diffusion Tacit, Explicit
Identification
Creation
Source:
Reinhardt and Pawlowsky
Metrics
Integration
Modification
Action
also see: Tools in Organizational Learning
http://duplox.wz-berlin.de/oldb/forslin.html
Operating System Security
 Access Control
- Subjects are Processes and Objects are Files
- Subjects have Read/Write Access to Objects
- E.g., Process P1 has read acces to File F1 and write access to
File F2
 Capabilities
- Processes must presses certain Capabilities / Certificates to
access certain files to execute certain programs
- E.g., Process P1 must have capability C to read file F
Mandatory Security
 Bell and La Padula Security Policy
- Subjects have clearance levels, Objects have sensitivity levels;
clearance and sensitivity levels are also called security levels
- Unclassified < Confidential < Secret < TopSecret
- Compartments are also possible
- Compartments and Security levels form a partially ordered
lattice
 Security Properties
- Simple Security Property: Subject has READ access to an object
of the subject’s security level dominates that of the objects
- Star (*) Property: Subject has WRITE access to an object if the
subject’s security level is dominated by that of the objects\
Covert Channel Example
 Trojan horse at a higher level covertly passes data to a Trojan
horse at a lower level
 Example:
- File Lock/Unlock problem
- Processes at Secret and Unclassified levels collude with
one another
- When the Secret process lock a file and the Unclassified
process finds the file locked, a 1 bit is passed covertly
- When the Secret process unlocks the file and the
Unclassified process finds it unlocked, a 1 bit is passed
covertly
- Over time the bits could contain sensitive data
Network Security
 Security across all network layers
- E.g., Data Link, Transport, Session, Presentation,
Application
 Network protocol security
Ver5ification and validation of network protocols
 Intrusion detection and prevention
- Applying data mining techniques
 Encryption and Cryptography
 Access control and trust policies
 Other Measures
- Prevention from denial of service, Secure routing, - - -
-
Steps to Designing a Secure System
 Requirements, Informal Policy and model
 Formal security policy and model
 Security architecture
- Identify security critical components; these components must be
trusted
 Design of the system
 Verification and Validation
Product Evaluation
 Orange Book
- Trusted Computer Systems Evaluation Criteria
 Classes C1, C2, B1, B2, B3, A1 and beyond
- C1 is the lowest level and A1 the highest level of assurance
- Formal methods are needed for A1 systems
 Interpretations of the Orange book for Networks (Trusted Network
Interpretation) and Databases (Trusted Database Interpretation)
 Several companion documents
- Auditing, Inference and Aggregation, etc.
 Many products are now evaluated using the federal Criteria
Security Threats to Web/E-commerce
Security
Threats and
Violations
Access
Control
Violations
Denial of
Service/
Infrastructure
Attacks
Integrity
Violations
Fraud
Sabotage
Confidentiality
Authentication
Nonrepudiation
Violations
Approaches and Solutions
 End-to-end security
- Need to secure the clients, servers, networks, operating
systems, transactions, data, and programming languages
- The various systems when put together have to be secure

Composable properties for security
 Access control rules, enforce security policies, auditing,
intrusion detection
 Verification and validation
 Security solutions proposed by W3C and OMG
 Java Security
 Firewalls
 Digital signatures and Message Digests, Cryptography
Other Security Technologies
 Data and Applications Security
 Middleware Security
 Insider Threat Analysis
 Risk Management
 Trust and Economics
 Biometrics