CS457 Introduction

Download Report

Transcript CS457 Introduction

CS457/557 Introduction –
Chapters 1-2
Relevance of DB
• DBs are a part of most decisions in an
enterprise
– Traditional DBs – Operational
– Data Warehouses – Decision Support
– NoSQL DBs – Information
Databases
• Databases play a critical role in?
– Business, medicine, industry, etc.,
– everything?
• Databases can be?
– Traditional, XML, Object-relational, multimedia,
real-time, Web
• What databases have you used recently?
Data vs. Databases
• Data
– Recorded known facts, implicit meaning
• Database (DB)
–
–
–
–
–
–
Collection of related data
Logically coherent
Represents mini-world
Designed, built for specific purpose
Intended user group
Preconceived applications
DBMS
• Database Management System (DBMS)
– Software
– Create and maintain a DB
– Define types of data
– Store on disk controlled by DBMS
– Manipulate data
DBMS cont’d
• Why a DBMS?
–
–
–
–
–
–
–
–
Program independence
Data abstraction
Conceptual representation
Meta data
Share data
Multiple views
Transaction processing
Higher overhead Fig. 2.3 and increased complexity
So why use a DBMS?
–OPTIMIZATION
Definitions
• Database System DBS
– Data + DBMS
• DBS
– Schema (meta-data) - DB description, schema diagram
– Instance (actual data) Fig. 1.2 - initially empty
• 3-schema architecture Fig 2.2
– External view
– Conceptual – structure of DB, hides physical
– Internal – physical storage access paths
Fig 2.1
Data Model
• Describes the structure
records, types, relationships, constraints,
basic operations
• DBMS based on data model
• Types:
– High-level (conceptual) - ER, UML, OO
– Low level (physical) - XML
– Implementation (representational) combines conceptual
and physical – Relational
– NoSQL data models – Column, key-value, document stores
DBMS Languages
• DDL - data definition language
• DML - data manipulations language
– High-level, nonprocedural
– Set at a time
– Interactive or embedded (host language)
• SQL most common/popular DB Language
DBMS
• Software to create, query, manipulate data in the
database
• Based on a particular data model
• Allows for program independence
• Provides language to define, manipulate data
• Contains meta data
Meta Data
• Data about the data
• “Metadata is structured information that describes,
explains, locates, or otherwise makes it easier to
retrieve, use, or manage an information resource.”
NISO
Meta Data
• Three categories of meta data (books as example):
– Structural metadata: A way to define how objects are put
together, for example, how pages are ordered to form
chapters.
– Administrative metadata: Information to help manage a
resource, such as when and how it was created, types, and
who has access
– Descriptive metadata: A resource for discovery and
identification, including elements such as title, abstract,
author, and keywords.
Meta Data
•Structural
– Student (Name, CWID, address, GPA, major)
•Administrative
– Owner of data?
• Account#, when created, modified
•Descriptive:
– Everything but the content
Meta Data –
According to the Guardian
• Metadata associated with emails:
– Sender's name, email, and IP address
– Recipient's name and email address
– Date, time, and time zone
– Unique identifier of email and related emails
– Mail client login records with IP address
– Mail client header formats
– Subject of email
Meta Data
• Metadata associated with mobile phones:
– Phone number of every caller
– Serial numbers of phones involved
– Time of call
– Duration of call
– Location of each participant
– Telephone calling card numbers
Meta Data
• Metadata associated with Facebook:
– Username and profile bio information including
birthday, hometown, work history, and interests
– Username and unique identifier
– User subscriptions
– User location
– User device
– Activity date, time, and time zone
Meta Data
• Metadata associated with web browsers:
– Activity including pages the user visits and when
visited
– User data and possibly user login details with
auto-fill features
– User IP address, internet service provider, device
hardware details, operating system, and browser
version
– Cookies and cached data from websites
Meta Data
• What about medical records?
Additional Characteristics
• Interfaces
• Actors
– DBA
– Designers
– Users
• Naïve or parametric (same info each time)
• Casual (different info each time)
• Sophisticated (implement own applications using
databases)
• Standalone (personal DB)
DB classifications
• Single-user vs. multi-user
• Centralized vs. distributed
• Homogeneous vs. heterogeneous
• Federated DBMS, multidatabase system
Extending traditional databases
•
•
•
•
Need for more complex databases
Object-oriented databases
Images, videos, scientific
Data mining (decision support systems),
spatial
• Data on the web for e-commerce
– XML
• Non or semi-structured data
• Databases for cloud computing
Application packages
• Software packages work with database backends
(>1 database)
• Web enabled
• Examples
– Enterprise Resource Planning (ERP)
• Integrate data and processes of organization
• Production, sales, distribution, marketing, finance, human
resources, etc.
– Customer Relationship Management (CRM)
• Integrate customer information
• Marketing and customer support
Information Retrieval IR
• Databases traditionally used for
– Banking, insurance, retail, finance, manufacturing,
payroll
• Information retrieval used for
– Books, manuscripts, library
• Searching based on key-words
• document processing
–keywords, categorization, ranking
documents
Information Retrieval IR
• Advent of web, IR is exciting again!
– Web pages have active objects, change
dynamically
– New strategies needed
• Big Data
• NoSQL
DB Management Issues
• This course 457/557
– Design/Model DBs
• Weird course – theory + applications
– Relational: Query DBs, Algebra, Normalization
We will use Oracle, MySQL
– Intro to: Security, performance, transactions, NoSQL
• Grad course 609
– Redundancy
– Integrity constraints and concurrency control (transactions)
– Backup and recovery
– In depth: performance, NoSQL