MIS 335 - W1-Std

Download Report

Transcript MIS 335 - W1-Std

MIS 335 - Database Systems
http://www.mis.boun.edu.tr/durahim/
Ahmet Onur Durahim
Learning Objectives
• Database systems concepts
• Designing and implementing a database
application
• Life of a Query in a Database System
Database Systems
• Data?
– Streams of raw facts representing events
occurring in organizations
• Information?
– Data shaped into a meaningful form that is useful
to human beings
• Database
– Organized collection of data
Information vs. Data
• Raw data is processes and organized to produce meaningful
and useful information
– Total unit sales, total sales revenue
Database Management System (DBMS)
• DBMS contains information about a particular
enterprise
– Collection of interrelated data
– Set of programs to access the data
– An environment that is both convenient and efficient
to use
• Database Applications:
–
–
–
–
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Transaction Processing Systems
• Serves operational managers and staff
– Keep track of the elementary activities and transactions of
the organization (sales, cash deposits, flow of materials)
– Monitor the status of internal operations and the firm’s
relations with the external environment
• Perform and record daily routine transactions
necessary to conduct business
– Sales order entry, payroll, and shipping
• Serve predefined, structured goals, tasks and resources
at the operational level
• Major producers of information for the other systems
and business functions
Transaction Processing Systems
Data Management Systems
• Responsible for organizing and managing firm’s data so
that they can be efficiently accessed and used
• Allows the definition, creation, querying, update, and
administration of databases (Transactions, Persistence of
data, Recovery, Concurrency control)
Database Software Providers
– Commercial
• Microsoft SQL Server, Oracle,
IBM DB2, Sybase
• These 4 companies supply more
than 90% of US DB market
– Open Source
• MySQL, PostgreSQL
Data Management and Storage
• NoSQL and Big Data Systems
– Amazon Dynamo, Cassandra, MongoDB, Neo4j
– Apache Hadoop
Why do we need DBMSs?
• Computers were originally developed for number
crunching
• By time, data storage and processing became as
important as scientific computing
• Amount and types of data increased
– Image/audio/video data
– Genome data
– Customer transactions
• DBMSs were developed to manage this data
Why do we need DBMSs?
•
•
•
•
•
•
•
Data independence
Efficient access
Data integrity and security
Uniform data administration
Concurrent access
Recovery from crashes
User-friendly declarative query language
File Systems vs Databases
Terminology and Basic Ideas
• Data Model: Describes conceptual structuring of data
stored in database
– data model is set of records. (records might each have studentID and name)
• The relational model of data is the most widely used
model today
– Main concept: relation, basically a table with rows and
columns
– Every relation has a schema, which describes the columns, or
fields
• Schema vs. Data
– Schema describes how data is to be structured
• defined at set-up time, rarely changes
• part of the "metadata"
– Data is actual "instance" of database, may change rapidly
Relational Model
Relational Model Concepts
Terminology and Basic Ideas
• Data Definition Language (DDL)
– Commands for setting up schema of database
– Process of designing schema can be complex
• may use design methodology and/or tool
• Data Manipulation Language (DML)
– Commands to manipulate data in database:
RETRIEVE, INSERT, DELETE, MODIFY
– Also called "query language"
Database Design
• The process of designing the general structure of
the database
– Logical Design: Decide on the “good” collection of the
relation schemas
– Logical modeling deals with gathering business
requirements and converting those requirements into
a model
• Business decision: What attributes should we record in the
database?
• IS decision: What relation schemas should we have and how
should the attributes be distributed among the various
relation schemas?
Database Design
• The process of designing the general structure of
the database
– Physical Design: Decide on the physical layout of the
database
– Physical modeling deals with the conversion of the
logical, or business model, into a relational database
model
– database software specific
• The objects defined during physical modeling can vary
depending on the relational database software being used.
• Most relational database systems have variations with the
way data types are represented and the way data is stored,
although basic data types are conceptually the same among
different implementations.
Logical vs Physical Modeling
Structured Query Language (SQL)
• Widely used non-procedural database query language
• Question: What is the name, email address and yearly
income of the customer with customerKey ‘11009’?
Customer Table
DimCustomer
Components of a DBMS
• Database system has five main pieces;
– Client Communications Manager that manages
communication between the users and the database
– Process manager that encapsulates and schedules the
various tasks in the system
– A statement-at-a time query processing engine
– A shared transactional storage subsystem that knits
together storage, buffer management, concurrency
control and recovery
– A set of shared utilities including memory
management, disk space management, replication,
and various batch utilities used for administration
Architectural Components of a DBMS
Life of a Query
• A database interaction at an airport, in which
a gate agent clicks on a form to request the
passenger list for a flight
• This button click results in a single-query
transaction
Life of a Query
• The personal computer at the airport gate (the “client”)
calls an API that in turn communicates over a network to
establish a connection with the Client Communications
Manager of a DBMS
– the responsibility of the DBMS’ client communications manager
• to establish and remember the connection state for the caller
• to respond to SQL commands from the caller
• to return both data and control messages (result codes, errors, etc.) as
appropriate
• In gate agent’s query example, the communications
manager would
– establish the security credentials of the client
– set up state to remember the details of the new connection and
the current SQL command across calls
– forward the client’s first request deeper into the DBMS to be
processed
Life of a Query
• Upon receiving the client’s first SQL command, the DBMS
(Process Manager) must assign a “thread of computation”
to the command
– Make decision regarding admission control: whether the system
should begin processing the query immediately, or defer
execution until a time when enough system resources are
available to devote to this query
• Begin executing the gate agent’s query by Relational Query
Processor.
– checks that the user is authorized to run the query, and
compiles the user’s SQL query text into an internal query plan
– The plan executor consists of a suite of “operators” (relational
algorithm implementations) for executing any query
• Operators implement relational query processing tasks including joins,
selection, projection, aggregation, sorting, etc.
Life of a Query
• One or more operators exist to request data from the
database. These operators make calls to fetch data
from the DBMS’ Transactional Storage Manager
– Manages all data access (read) and manipulation (create,
update, delete) calls
– Invoke the transaction management code to ensure the
well-known “ACID” properties of transactions
– Before accessing data;
• locks are acquired from a lock manager to ensure correct
execution in the face of other concurrent queries.
• If the gate agent’s query involved updates to the database, it
would interact with the log manager to ensure that the transaction
was durable if committed, and fully undone if aborted
Life of a Query
• Agent’s query has begun to access data records, and is
ready to use them to compute results for the client.
– done by “unwinding the stack” of activities
• The access methods return control to the query executor’s operators
– which orchestrate the computation of result tuples from database data
• as result tuples are generated, they are placed in a buffer for the
client communications manager
– which ships the results back to the caller.
• At the end of the query: the transaction is completed and
the connection closed
– the transaction manager cleaning up state for the transaction
– the process manager freeing any control structures for the
query
– the communications manager cleaning up communication
state for the connection
Life of a Query
• Shared components and utilities that are vital to
the operation of a full-function DBMS.
• The catalog and memory managers are invoked
as utilities during any transaction.
– The catalog is used by the query processor during
authentication, parsing, and query optimization.
– The memory manager is used throughout the DBMS
whenever memory needs to be dynamically allocated
or deallocated.
• Remaining modules that run independently of
any particular query, keeping the database as a
whole well tuned and reliable