SQL Server Database Files

Download Report

Transcript SQL Server Database Files

Instructor: Craig Duckett
Lecture 17: Thursday, May 25th, 2017
SQL Server Keys, Files,
Functional Concepts and Architecture, Database Files
1
Today will be the last lecture in BIT275. Starting Tuesday, May 30th, class time will be dedicated
to Assignment 3 Workshops. I will be supplying templates for building tables, inserting data,
creating views, and creating stored procedures that you can use to construct your individual
database projects. I will also be offering one-on-one help for those who feel they need
additional help in order to complete their project. If you are comfortable with your work on
Assignment 3, then the next two BIT275 class days are optional (Tuesday, May 30th and
Thursday, June 1st).
Both ASSIGNMENT 3 and the DATABASE PRESENTATION are due on Tuesday, June 6th
• Assignment 3 is due LECTURE 20, Tuesday, June 6th
• Database Presentation is due LECTURE 20, Tuesday, June 6th
• Final Exam is LECTURE 12, Thursday, June 8th
2
3 x 150 Points (450 points Total)
• Assignment 1 GRADED!
• Assignment 2 GRADED!
• Assignment 3 (Stage 3): DUE LECTURE 20 Tuesday, June 6th
• Database Presentation: DUE LECTURE 20 Tuesday, June 6th
3
Tuesday/Thursday (LECTURE 16)
• Database Design for Mere Mortals: Chapters 12, 13
4
•
•
•
•
SQL Server Keys
SQL Server Database Files
Transactions and Locks (Separate PowerPoint)
Triggers (Separate PowerPoint)
FOR REFERENCE ONLY
• Functional Concepts and Architecture
• Database Files
5
SQL Server Primary and Foreign Keys
An Overview
Setting Up Your Primary Key (PK)
1.
2.
3.
4.
From the main window, select your database
Double-click on Tables to expand
Select table of your choice, right-click on it, and select Design
Find the Column Name that you want to make the Primary Key, right-click on it a select Set
Primary Key
5. This creates the Primary Key and places a yellow key icon beside it in the Design table
ERROR: Setting Up Your Primary Key (PK)
From Allowed NULL to NOT Allowed NULL
From the main window, select
Tools > Options > Designers >
Table and Database Designers
• Uncheck the option Prevent
saving changes that require
table re-creation
• OK
Add AUTO INCREMENT to the Primary Key
1. In the Designer section,
highlight the Primary
Key row (by clicking on
the yellow Primary Key
icon)
2. In the Column
Properties tab, scroll
down to Identity
Specification, and in
the dropdown menu
beside (Is Identity)
change this from No to
Yes
3. You can go with the
default values of '1' for
Increment and Seed or
you can change these
to your specifications
Inserting Data into a Table (See the AI at Work)
1. Right-click on table of your choice, and select Edit Top 200 Rows
Adding Foreign Keys (FK)
NOTE: Before adding FOREIGN KEYS you must already have all your PRIMARY KEYS
set and configured in your pertinent tables
1.
2.
3.
4.
5.
Open your database, and expand your Tables directory tree
Right-click on the table that is going to contain the Foreign Key(s), and select Design
Right-click on the blank white area and select Relationships
In the Foreign Key Relationships window, click Add
Click Tables And Columns Specification, then the 'three ellipsis' button […]
Adding Foreign Keys (FK)
6. Select the Primary Key Table and field that has been designated the primary Key, and then
select the Foreign Key Table and the key that will be the associated Foreign Key, then OK
7. Save, and click by the warning about saving
Adding Foreign Keys (FK)
8. I repeat the process for my second Foreign Key field, this time pointing to the pertinent
Primary Key tables, fields, Foreign Key tables, fields, etc, then saving
Adding Foreign Keys (FK)
9. Back in the main window, I select my database, then right-click on Database Diagrams > New
Database Diagram
10. I highlight the tables I want to add, then Add
Adding Foreign Keys (FK)
11. The Database Diagram comes up showing the selected tables with their associated
relationships using the Primary and Foreign Keys
12. At this point you have the option of saving the diagram (I usually save mine, because I can
always delete them later as I build complexity into the database).
Adding Foreign Keys (FK)
13. Back in the main window, you can expand
all your tables and Key folders to see the
Primary Keys. To see the Foreign Keys on
the linking table, you will have to expand
the Columns folder.
SQL Server Database Files
SQL Server Database Files
SQL Server System Databases
•
•
•
•
master
model
msdb
tempdb
SQL Server System Databases are very simple and pretty small databases (msdb might be huge in
some cases if we are not properly maintaining it) at a very high level, but they are also most
important databases within SQL Server. Why am I stressing 'the most important'? Because SQL
Server stores the majority of its configuration information in these databases, besides storing
some in the Registry and few in configuration(ini/xml) files, and SQL Server needs them to
coexist in a healthy and happy shape in order to remain in a running state. So what are the
System Databases in SQL Server?
http://www.youtube.com/watch?v=Wjv1c1lgitY
SQL Server Database Files
MASTER
MSDN Information about Master
The master database stores all the systemlevel information for SQL Server. The data
stored by the master database includes
information for: configuration settings,
logon accounts, linked servers and
endpoints, user database file locations, and
properties.
The master database is the most important
database on SQL Server, because due to the
nature of the data stored, SQL Server
cannot operate without the master
database. So it is a very good idea to
backup this database after changing the
SQL Server configuration, modifying,
adding, or removing any databases.
Examples:
Tables > System Tables
Views > select * from sys.databases
Programmability > sp_attach_db
SQL Server Database Files
MSDB
MSDN Information About MSDB
The msdb database is used by SQL Server to store
information on operations performed by SQL Server.
This includes information for: the SQL Server Agent,
Database Mail, the Service Broker, SSIS packages, log
shipping, backup, and maintenance plan job
parameters.
Examples:
Tables > System Tables > dbo.sysjobs
- SQL Server Agent
- SSIS Packages (SS Integration Services for crunching/manipulating data)
Views > dbo. sysjobs_view
Programmability > Stored Procedures
SQL Server Database Files
TEMPDB
MSDN Information About TEMPDB
As the name implies, SQL Server uses the tempdb database
for storing temporary data and data objects. The tempdb
database is used when an operation requires a temporary
table, stored procedure, or other database object to be
performed. Intermediary data for large sort operations is also
stored in the tempdb database as well as temporary data for
internal SQL Server operations. It is where instances of
database queries are cached.
Every time SQL Server is restarted, the tempdb system
database is recreated thus clearing any temporary data
stored during the last SQL Server session. In cases where a
high volume of users and operations are performed with SQL
Server the tempdb database can grow to use a significantly
large amount of disk space. It is important to plan accordingly
in these scenarios since running out of disk space where the
tempdb database is stored will have catastrophic effects on
the operation of SQL Server. DBAs will often store the
tempdb on a completely different drive
SQL Server Database Files
MODEL
MSDN Information About MODEL
SQL Server uses the model database for creating new
databases. Simply put, this is a template.
When the “create database” statement is used, SQL Server
copies the contents of the model database to the newly
created database. If there are any common database objects
that would prove useful in all databases created by SQL
Server, it is possible to add those objects to the model
database.
Then when a database is created by the SQL Server instance,
the user defined objects will be copied to it along with the
default objects. Since SQL Server recreates the tempdb
database every time it is started, the model database is
required in order for SQL Server to start.
Demo:
Create 'Awesome' Table in Model
Create instance of a new Database
SQL Server Database Files
RESOURCE
MSDN Information About RESOURCE
The resource db is a "read-only" database used for storing all the system views and stored
procedures. Logically, each SQL Server database will contain all these system objects, however, they
are physically stored within the resource database. The resource database is read-only and does not
include any user data.
In previous versions of SQL Server, the system objects were stored in the master database. The
motivation behind moving the objects to a separate database is to make updating the SQL Server
more efficient. Improvements and fix-ups to the SQL Server system generally manifest mostly on the
system objects. A separate database to store the system objects reduces the number of files that
need to be replaced with an update.
LOCATION:
C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn
FYI:
It is recommends that you don't move
the resource databases to a separate
directory or drive but to leave 'as is'
SQL Server Books Online
MSDN Information About SQL SERVER BOOKS ONLINE
Accessing SQL Server Books Online
You can access SQL Server Books Online in the following ways:
From the Start menu: Click Start > All Programs > Microsoft SQL Server 2008 R2 >
Documentation and Tutorials > SQL Server Books Online
From SQL Server Management Studio: On the Help menu, click How Do I, Search, Contents,
Index, or Help Favorites.
From F1 or Help buttons in the user interface - For context-sensitive information, press F1 on
your keyboard, or click Help in the user interface dialog box.
From the Dynamic Help window - The Dynamic Help window automatically displays links to Books
Online topics related to the task you are doing. To launch Dynamic Help, click Dynamic Help on
the Help menu in SQL Server Management Studio or Business Intelligence Development Studio.
SQL Server Tutorials
Accessing SQL Server Tutorials
MSDN Information About SQL Server Tutorials
From the Start menu: Click Start > All Programs > Microsoft SQL Server 2008 R2 >
Documentation and Tutorials > SQL Server Tutorials
Tutorial: SQL Server Management Studio
Tutorial: Writing Transact-SQL Statements
Computed | Calculated Columns
Computed (Calculated) Columns
A computed column is a column that contains values that are calculated, rather than inserted.
When the column is defined, a computed column specification is provided. This definition
includes the expression that is used to determine the value of the column in each row of the
table. It is not possible to specify the values for these columns in INSERT or SELECT statements.
Adding a Computed Column
1. Select a database table that might benefit from having a calculated column
http://www.blackwasp.co.uk/SQLComputedColumns.aspx
Computed (Calculated) Columns
2. Right-click on the table, select Design, and a create a column and data type that will be used
for the computed or calculated column
3. While having the focus, go down into Column Properties and select the Computed Column
Specification, click down in the (Formula) section, and type in your calculation
Computed (Calculated) Columns
4. Now when I examine the table after right-clicking and invoking Select Top 1000 Rows, I see
the calculated column Total
Computed (Calculated) Columns
MSDN Information: Specify Computed Columns in a Table
Functional Concepts and Architecture
The Following Slides Have Been
Included For Reference Only
Functional Concepts and Architecture
SQL – Structure Query Language
Most vendors, back in the day, began to adopt proprietary versions of that although they had a
lot of overlap of certain types of language elements, specifically the language elements intended
for
DML – Data Manipulation Language  ANSI SQL
DML tended to be pretty much the same across different vendors with just really minor
variations. To help that cleaner across the industry ANSI SQL was developed and there have been
2 or 3 revisions of this over the years.
Now, for the majority of its Data Manipulation Language, SQL SERVER 2008 fully complies with
the ANSI SQL standard.
Now Data Manipulation basically means:
• Selecting data to a database (SELECT)
• Adding data to database (INSERT)
• Removing data from database (DELETE)
• Changing data in database (UPDATE)
This doesn't give you any management capability, though. It doesn't let you create new
databases, change or check performance, optimize data or databases, etc.
ANSI = American National Standards Institute
ISO = International Standards Organization
Functional Concepts and Architecture
DDL – Data Definition Language
Because every DBMS works a little bit differently (MySQL, SQL Server, Oracle, DB2, etc) the DDL
really can't be standardized, so each vendor comes up with its own variant (or "dialect") of the
SQL language. CREATE | DROP | ALTER
Microsoft's superset of the SQL language, or variant, is called Transact-SQL (or T-SQL, for short),
and part of that does comply with the ANSI SQL standard for data manipulation language,
Example:
SELECT ID, Name, Address
FROM Customers
WHERE state = 'Nevada';
DELETE from Customers
WHERE ID = 5;
It's when we move over to the DDL side of things that the variants or "dialects" between the
different DMBS really come into play. The way you use CREATE TABLE in MS SQL Server 2008 is a
bit different than the way you'd CREATE TABLE in MySQL or PostgreSQL or DB2, etc.
Functional Concepts and Architecture
What is a Table?
Tables can be thought of in either logical or physical terms. For example, when most DBAs think of a table, they
see a two-dimensional set of values arranged into rows and columns, similar in appearance to a
spreadsheet. Each column represents a different data element, and each row represents a collection of related
data elements.
For example, here is a representation of a logical table that has three columns and four rows. This should look
very familiar to DBAs as this is how most of us visualize what a table in a database looks like. It also looks
similar to the results we see when we perform a SELECT statement on a table within Management Studio.
Functional Concepts and Architecture
What is a Table?
While a logical visualization of table is very handy for DBAs to use in their work, this is not how SQL Server sees a table.
When a table is created, it has to be implemented physically inside a database.
For example, the most basic structure is the database file (.mdf and .ndf files). The database file can include many objects,
but for now, let's focus on tables. I'll talk about a third type of file, the transaction log file (.ldf) later on.
Every database has one primary data file. Also, all the data in the database objects
(tables, stored procedures, views, triggers.. etc.) are stored in the primary data files.
The recommended and the most common file name extension for primary data files is
.mdf
You can only have one primary data file for a database, as well as secondary data files.
Some databases may not have any secondary data file, but its also possible to have
multiple secondary data files for a single database. Secondary data files are usually
denoted with the .ndf extension.
Log files in SQL Server databases hold all the log information. This information can be
later used to recover the database. The size of the log file is determined by the logging
level you have set up on the database. There must be at least one log file for each
database. But it is also possible to have more than one log file for a single database.
The recommended file name extension for log files is .ldf.
Locations of all the files in a database (primary, secondary and log files) are stored in
the primary file of the database and in the master database. When the SQL Server
Database Engine want to use the file location information of those files, it retrieves
the data from the master database.
Functional Concepts and Architecture
Each table in a database can be broken into multiple components. Let’s take a brief look at each of them so we
better understand what makes up a physical table within SQL Server.
Functional Concepts and Architecture
The first component is a partition. A partition is simply a way DBAs can sub-divide rows
in a table (and the table’s indexes) into separate sections, generally for the purpose of
horizontally spreading rows across more than one filegroup. This often makes it
easier for DBAs to manage and scale very large tables.
The topic of partitioning is outside the scope of this class, so we'll assume there is a
single partition (which is the default option when creating a table). Only the Enterprise
and Developer editions of SQL Server support partitioning.
Partitions contain data rows stored in either the physical form of a heap (a table without a clustered index) or a B-Tree structure (a
table with a clustered index). We'll talk about the heap, b-tree structure, clustered and non-clustered indexes in a moment, but first we
need to discuss how the data types are sub-divided into three different categories before they are stored in allocation units, comprised
of pages.
•
In_Row_Data: This is the most common type of allocation unit, and is used to store data and index pages, except for Large Object
(LOB) data. In other words, most of your row data (and related indexes) are stored in this type of allocation unit.
•
Lob_Data: Used to store Large Object (LOB) data, such as text, ntext, xml, image, varchar(max), nvarchar(max), varbinary(max),
and Common Language Runtime (CLR) user-defined data types.
•
Row_Overflow_Data: Used to store data pages when the variable data types—varchar, nvarchar, varbinary, and sql_variant data
columns—that exceed the 8,060 bytes that can fit onto a single data page.
Functional Concepts and Architecture
Allocation units themselves are made up of extents. In our example, we see that each allocation unit has one
extent. In the real world, each allocation unit would include many, many extents.
An extent is a collection of eight contiguous 8K pages (for a total of 64K).
A page is the most elemental unit of data storage used by SQL Server, and will be the focus of our next discussion.
Functional Concepts and Architecture
Primary Storage Structure in MS SQL Server
How does SQL Server store data? Logically, Data is stored in records, and records are stored in pages.
A database page is exactly 8KB (8192-byte) in size, and acts as an individual piece of a database data file. If you
have 1 MB worth of data, that data will be stored on 128 separate pages.
These 8K pages aren't dedicated just for the data only; the first 96 bytes are used to store information about
the page type, free space, object id, and location of the table that this page belongs to. This is the header. After
this header information, the data itself comes in (in the form of data rows which are inserted one after
another in a serial manner). This is the body.
Following the page header, starting at byte 96 on the
page, are the actual data rows. Each data row has a
unique row number written within the page. Data rows
in SQL Server cannot cross page boundaries. The
maximum available space in a SQL Server page is 8096
bytes (8192 bytes minus the 96 byte header), but
including 36 bytes for holding log information actually
lowers the amount of data that can be held on the page
down to 8060 bytes.
http://sqlblog.com/blogs/linchi_shea/archive/2008/03/03/is-the-8kb-page-obsolete-or-aging.aspx
Functional Concepts and Architecture
As stated earlier, the limit on data row size does not take into account columns of text, ntext, or image data types
because these data values are stored separately from the data row, as we'll look at a bit later on.
Data rows are put on the page serially, starting immediately after the
header.
A row offset table starts at the end of the page, and each row offset
table contains one entry for each row on the page. Each entry records
how far the first byte of the row is from the start of the page.
The entries in the row offset table are in reverse sequence from the
sequence of the rows on the page.
Every SQL Server page (8,192 bytes) begins with a 96-byte header used to store metadata about the page. This includes the page’s number,
page type, amount of free space on the page, the allocation unit ID of the object that owns the page.
Immediately following the page header is the data to be stored, one row after another. Each page can store a maximum of 8,060 bytes. The
number of rows that can be stored on a page depends on the size of the rows. For example, the smaller the rows, the more rows that can fit on
a single data page. When a page becomes full, any new data will be inserted in another data page.
In the real world, each row may be a different length, which means that the beginning and ending of each row can be anywhere within the
page.
If rows are packed in, one after another, and if each row can vary in size, how does SQL Server know when a row starts and when it ends? That
is what row offsets are used for. Every row in a page has a row offset, which indicates how far the first byte of the row is from the start of the
page. Row offset data is stored at the bottom of a page in an area reserved for this information. For example, when the first row is added to a
page, the location of the first byte of that row is stored in offset row one. When row two is added, the location of the first byte of the second
row is stored in offset row two, and so on. This way, SQL Server knows where each row begins and ends.
Functional Concepts and Architecture
Large Row Support
Rows cannot span pages, however portions of the row may be moved off the row's page so that the row can
actually be very large.
The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB).
However, this does not include the data stored in the Text/Image page type. This restriction is relaxed for tables
that contain varchar, nvarchar, varbinary, or sql_variant columns.
When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL
Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA
allocation unit, starting with the column with the largest width. This is done whenever an insert or update
operation increases the total size of the row beyond the 8060 byte limit.
When a column is moved to a page in the
ROW_OVERFLOW_DATA allocation unit, a 24-byte
pointer on the original page in the IN_ROW_DATA
allocation unit is maintained. If a subsequent operation
reduces the row size, SQL Server dynamically moves the
columns back to the original data page.
ROW_OVERFLOW_DATA
We'll look how this relates to BLOBS
and FILESTREAMS in a minute
Functional Concepts and Architecture
Extents
Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB.
This means SQL Server databases have 16 extents per megabyte.
To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts
of data. SQL Server has two types of extents:
•
Uniform extents are owned by a single object; all eight pages in the extent can only be used by
•
the owning object.
Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be
owned by a different object.
A new table or index is generally allocated pages from mixed extents. When the table or index grows to the
point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create
an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the
index are in uniform extents.
Functional Concepts and Architecture
Databases are optimized for structured data that is relatively small – about 8KB per row. At that rate, you can
put 130,000 packed-out rows into a database and still have a database under 1 gigabyte.
Generally speaking, smaller databases are easier to backup, restore, and maintain, so keeping them small is
good.
Documents, however, tend to be orders of magnitude larger than the average database row size and can
quickly bloat the database; a content database upwards of 100 gigabytes is by no means uncommon and that
size is comprised mostly by the physical files that reside inside the database.
When your content database size starts to expand to the point of unmanageability, one of the first questions
that may come to mind is why the heck are all those files in there in the first place?
It makes perfect sense to store the structured document metadata like the author, location, keywords,
document type, and file size directly in the database, for querying purposes, but the physical file itself is just
taking up room until someone goes to download it.
So the option to move files outside of the database and back into the file system, where they have traditionally
resided, seems very sensible.
Functional Concepts and Architecture
What is a Blob?
As much as you want your documents outside of the database, the folks who make databases really don't like
them being in there either. Documents tend to represent unstructured data that makes it difficult to use them
in querying, defining relationships, indexing, or any other database operation that normally makes putting data
into a database useful. As far as the database is concerned, a document is just a bunch of bits that it has to
hold onto until someone asks for them back.
Logically, storing a document in a database row looks something like this
Databases performance is optimized for rows that are under 8KB in size. If the actual implementation looked
like this, that size would easily be breached and performance would lag.
Functional Concepts and Architecture
Fortunately, database developers came up with the concept of a Binary Large Object (BLOB). In essence,
the BLOB is a pointer that stores the location of binary data. The pointer, which is relatively small, resides with
the normal database record. The binary data, which is comparatively huge, resides in some dark, magical
location in the database called BLOB storage.
With the binary data offloaded to another location in the database, the database record becomes much
smaller and more optimal for index and query performance. If we actually need the binary data associated
with the row, the database uses the pointer in the row to acquire the binary data from BLOB storage.
The process of going to BLOB storage is completely transparent, so it appears as though the data is coming
directly from the row itself. There is a slight performance cost for going to BLOB storage for raw data retrieval,
but the query performance benefits tend to outweigh the retrieval cost.
Functional Concepts and Architecture
What is Remote Blob Storage?
BLOB storage is effectively abstracted from database users. You ask for it, the database goes out and gets it.
This brings up a new question: why does the BLOB storage have to be in the database? In SQL Server 2005 it
had to be in the database simply because that's how they built it. But in SQL 2008 and new newer versions,
you've got a few more options because Microsoft implemented a FileStream provider model for the BLOB
storage mechanism that opens up the possibility of storing it elsewhere.
Remote BLOB Storage is the term used to denote that a BLOB provider stores BLOB data outside of the
database (i.e. remotely).
Functional Concepts and Architecture
Where Can I Store Remote Blob Data?
A marketing guy would eagerly tell you that you can store BLOBs anywhere you want, but the
reality is that you can store it anywhere you want as long there is a provider available capable of
storing it there. If you're really ambitious you can build your own (not recommended), but the
majority of us are just going to use what comes out of the box, or whatever is available from third
party providers.
Out of the box you get two providers:
• Native SQL BLOB Provider (Default) Stores BLOB data directly in the database, just like
•
SQL Server 2005. This is the default BLOB provider whenever you setup a new instance of SQL
server.
FileStream BLOB Provider This is the only Remote BLOB provider that ships with SQL
Server and it allows you to store BLOB data on any local hard drive on the SQL server.
http://www.ceservices.com/adding-filestream-existing-table-database-sql-2008-r2
Free PDF Book: http://download.red-gate.com/ebooks/SQL/Art_of_SS_Filestream_Sebastian_and_Aelterman.pdf
Functional Concepts and Architecture
Now that you have a basic understanding of how SQL Server stored data, it’s time to talk about
the heaps and B-Trees structures that we talked about briefly a little earlier.
What is a Heap?
A heap is simply a table without a clustered index. (We will talk about clustered indexes a bit later
too).
When rows are added to a heap, they are not stored in any particular order on a data page, and
data pages are not stored in any particular sequence within a database. In other words, rows are
stored wherever there is room available. This means that the data pages that contain the rows of
the heap may be scattered throughout a database, in no particular order.
Since a table can’t exist as a bunch of scattered pages, SQL Server provides a way to link them all
together so that they act as a single table. This is done using what are called Index Allocation
Map (IAM) pages.
IAM pages manage the space allocated to heaps (among other tasks), and is what is used to
connect the scattered pages (and their rows) into a table.
Functional Concepts and Architecture
Functional Concepts and Architecture
Let’s say that a query is executed against a heap, and a table scan has to be performed to find the
data to be returned.
In order to find the rows in the table that need to be scanned, the SQL Server Database Engine
first goes to an IAM page, which includes pointers to the various extents and pages that contain
the rows of the table that belong to the heap.
If a heap is large, numerous IAM pages and data pages have to be examined before all the rows
are located and scanned. If the data pages happen (by coincidence) to be physically contiguous,
then such scans can be I/O efficient because sequential reads can be used to read the data.
But in many cases, the various heap pages are scattered about, which requires less efficient
random reads to scan through all the rows of the table, which can hurt performance.
Functional Concepts and Architecture
What is an Index?
In the previous section we talked about heaps, which are tables without a clustered index. In this
section, we will talk about what an index is. Later, we will talk about specific kinds of indexes
that can be added to SQL Server tables.
As I have already mentioned, an index is simply a way to help queries return data faster from
tables. In SQL Server, all indexes physically take the form of what is called a B-Tree.
The “B” in B-Tree refers to “balanced,” so B-Tree is short for Balanced Tree. This is because
a B-Tree index self balances, which means that every time a tree branches (splits into two
pages because the original page is full), about half of the data is left on the old page, and
the other half is put on the new page. One of the key benefits of having a Balanced Tree is
that finding any particular row in a table requires about the same amount of SQL Server
resources because the index has the same depth (number of levels) throughout its
structure.
Functional Concepts and Architecture
Notice that B-Trees have several levels. They include:
Root Level: A B-Tree starts with a single index page called the root level. This is where a query begins to look for
specific data within the B-Tree. In our example, our root level page only contains two values. Most root level pages
have many values, each referring to a page in the intermediate level.
Intermediate Level: Most B-Trees have one or more intermediate levels. The number of intermediate levels
depends on the amount of data stored in the table being indexed. Our example has only one intermediate level,
which includes two index pages.
Leaf Level: A B-Tree has a single leaf level which may include many, many data pages, depending on the amount of
the data stored as part of the leaf level. The leaf level is where the data you are looking for is stored. For example, in
the leaf level page that starts with “1”, all the rows with a customer number that ranges from “1” through “2499”
are located. On the leaf level page that begins with “2500”, all the rows that range from “2500” to “4999” are
located, and so on.
Functional Concepts and Architecture
Here’s how a B-Tree index works.
In a B-Tree index search, we always begin at the root level. For example, to find the rows that have a customer
number of “4”, the first step is to scan the rows at the root level. In our example, there are two rows than can
be scanned: a “1” and a “5000”. Each of these rows points to different pages in the intermediate level. The rows
we are seeking have a customer number of “4”, but as you can see, there is no “4” at the root level. But since
this index is in numerical order, we know that “4” follows “1”, but is less than “5000”. So, to find the rows that
have a value of “4”, we must hop down (transverse) to the intermediate level to further narrow our search. As
you can see in the example, the number “1” in the root level points to an index page in the intermediate level
that also starts with the number “1”.
Functional Concepts and Architecture
When we go to the intermediate level and look at the index page that begins with the number “1”, notice that it
only has two records: a “1” and a “2500”. Notice that the number “4” is still nowhere to be found. Again, we
know that the number “4” is greater than 1, but less than “2500”, so we need to transverse to the leaf level and
see if we can find it there.
Functional Concepts and Architecture
So the next step is to transverse the index to the leaf level page that begins with “1” and ends with “2499”, as
we know that number “4” falls between these numbers. Once we are at the correct leaf level page, the rows in
this page are examined until we find one or more records that have a customer number of “4”.
While this seems like a long way around to find the rows that have a customer number of “4”, it is actually very
short. Instead of having to potentially scanning through thousands and thousands of data pages looking for the
rows to be returned, the query only had to scan through several different index pages until the rows that
matched the WHERE clause of the query are found and returned. This is usually a much faster way to find the
rows you want to return. Of course, this example was kept simple. In the real world, there may be more
intermediate levels that need to be transversed, but using a B-Tree index to retrieve data is usually much faster
than a scan of every row in a table.
Now that you have a basic understanding of how B-Tree indexes work, we are ready to talk about the specific
kinds of indexes available to SQL Server. Keep in mind that although each of the following indexes work slightly
differently, that are all still B-Tree indexes, and work on the principles described above.
Functional Concepts and Architecture
Types of Indexes
SQL Server includes many different types of B-Tree indexes, including:
•
•
Clustered
Non-clustered
•
•
•
•
•
Indexed view
XML
Filtered (New with SQL Server 2008)
Spatial (New with SQL Server 2008)
Compressed Indexes (New to SQL Server 2008)
Today, I'll only be looking at Clustered and Non-Clustered B-Tree Indexes.
Functional Concepts and Architecture
Clustered
Technically speaking, a clustered index is a B-Tree data structure where all the rows in a table are stored at the leaf
level of the index. In other words, a clustered index not only includes the root level and intermediate level index
pages found in B-Tree indexes, it also includes all the data pages that contain every row in the entire table. In
addition, the rows in a clustered table are logically ordered by the key selected for the clustered index (unlike a heap
whose rows and pages are unordered). Because of this, a table can only have one clustered index.
Functional Concepts and Architecture
A clustered index orders the data stored in it logically. Often, DBAs confuse logical ordering with physical ordering. For
example, what if a new row is added between two other rows, and there is no room on the existing page for the new row?
When SQL Server faces this problem, here’s what it does. Once SQL Server has determined more space is needed on a page,
the page will be split. This means that about half of the rows stay on the existing page and about half of the rows are added
to a new page. The physical order of the rows stays intact on the two pages, with the new row being place in the proper
order on either the original or the new page.
Think for a moment, if SQL Server required that all rows in a clustered index were to be physically contiguous, then as part of
the page split, the new page would have to be physically inserted directly after the original page, and all of the following data
pages would have to be shifted down one page. If there were 100,000 pages below the page that was split, that would mean
that 100,000 pages would have to be physically moved, which would be hugely resource intensive and inefficient. Instead,
what SQL Server does when a page split occurs, is to place the new page where there is room for it, which in many cases
won’t be very close to the original page. In other words, the physical order of the pages aren’t maintained. But what is
maintained is the logical ordering of the pages. For example, when a page is split between the original and the new page,
they are logically connected using the page chain, which is used to enforce the logical ordering of the data.
Functional Concepts and Architecture
Non-Clustered Index
Like a clustered index, a non-clustered index is a B-Tree data structure (although it is a separate data structure from
the clustered index). The main difference between the two is that while a clustered index includes all of the data of a
table at the leaf level, a non-clustered index does not. Instead, at the leaf level, a non-clustered index includes the
key value and a bookmark (pointer) that tells SQL Server where to find the actual row in the clustered index.
Because a non-clustered index doesn’t normally store the actual data at its leaf level, it is possible to have many
non-clustered indexes on a table. For example, a table can have one clustered index and up to 249 non-clustered
index.
Let’s take a brief look at how a non-clustered index works. In this example, we are going expand upon our previous
example of a Customer table that has three columns: Customer Number, First Name, and Last Name. As in our
previous example, we are going to have a clustered index that uses the Customer Number as its key. In addition, we
are going to add a non-clustered index on the Last Name column. Given this, let’s say we want to return all the rows
(and all the row’s columns) in the Customer Table that have a last name of “Victoria”, and that we want to use the
non-clustered index on the Last Name column to quickly find and return the data.
Functional Concepts and Architecture
xxx
Functional Concepts and Architecture
The first step in retrieving all of the rows with a Last Name of “Victoria” is to use the
non-clustered index on the Last Name column to identify any rows that match. To find
the rows, SQL Server begins at the root level of the non-clustered index, looking for any
matching rows. The root page is scanned and two rows are found: an “A” and an “M”.
Each of these rows points to different pages in the intermediate level. The first letter of
“Victoria” is “V”, but there is no row in the root level page that begins with the letter
“V”. But since this index is in alphabetical order, we know that “V” follows “M”. So, to
find the letter “V”, we must transverse to the intermediate level to further narrow our
search. As you can see, the letter “M” points to an index page in the intermediate level
that starts with the letter “M”.
When we go to the intermediate level and look at the index page that begins with the letter “M”, notice that it only has
two records: an “M” and an “S”. Notice that the letter “V” is still nowhere to be found. Since we can’t find the letter “V”
on this index page, we need to transverse to the leaf level and see if we can find it there.
Once we get to the leaf level page that begins with “S”, we now identify all the rows on that page that begin with “V”.
Each row that begins with “V” is then compared to the last name “Victoria”. In our case, there is a single match. The last
name “Victoria” is found in the Last Name non-clustered index key. Now we know that a row does exist that meets our
selection criteria, the problem is that all of the data for that row (Customer Number, First Name, and Last Name) aren’t
stored at the leaf level. Instead, the only data stored at the leaf level is the key “Victoria” and a bookmark that points to
the corresponding record in the table—which as we know—is a clustered index. In this example, the bookmark
correspondents with the Customer Number “4”.
Now that SQL Server knows that a row does exist, and that it exists in the clustered index as Customer Number 4, the
actual contents of the row (the three columns) must be retrieved. So next, SQL Server must take the bookmark value,
the number “4”, and then start at the root level of the clustered index, and then transverse the intermediate and leaf
level until it locates the matching row. At this point, the data is returned back the original query.
SQL Server Database Files
SQL Server Database Files
.MDF File
•
"Main Data File"
• Database Tables
• Indexes
• Triggers
• Store Procedures
• Etc.
.LDF File
•
•
"Log Data File"
Transaction Log file
.NDF File(s)
•
•
•
•
"aNother Data File"
Secondary Database file
Can make Primary
Can change path, size
By default, a database will only have an
.MDF and .LDF file. It's up to the database
designer / developer / administrator
whether or not it also has an .NDF file (or
multiple .NDF files).
File placement strategies are all about
performance and reliability.
Typically, you are going to want to place
your .LDF files on a separate disk/drive
from everything else. Ideally you don't
want them on the same disk as the
operating system or the SQL Server
program files, and you definitely don't
want them on the same disks as your
databases.
SQL Server Database Files
By default, a database will only have an .MDF and .LDF file. It's up to the database designer / developer /
administrator whether or not it also has an .NDF file (or multiple .NDF files)
SQL Server Database Files
DATABASE 1
DATABASE 2
FILESTREAM
SQL Server Database Files
DATABASE 1
DATABASE 2
FILEGROUPS
FILESTREAM
SQL Server Database Files
Where are the SQL Server database files actually located?
C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA
SQL Server Database Files
How do you create or alter the .MDF, .NDF, or . LDF files?
1. Open SQL Server Management Studio (SSMS), and login
2. In the Object Explorer column, expand the Databases directory
3. Right-click on your database (e.g., Books) and select Properties from the popup
SQL Server Database Files
4. In the Database Properties window, select Files (from Select a page column)
SQL Server Database Files
5. In the Database Files window, under the Logical Name column, can be found the
file names for your .mdf and .ldf files
6. Here you can change the Initial Size (MB) of the files as well as set up the
Autogrowth properties of both files
- Click in the Initial Size (MB) window to change the size
- Click on the "three ellipsis" button under Autogrowth to change properties
7. Here too, you can scroll  to the right to see the Path and File Names of your
database files
SQL Server Database Files
8. To create a .NDF file for your database, click the Add button in the Database
Properties windows; this will create a new row in the Database files window
SQL Server Database Files
9. Give the new Database file a name (since I’m going to use it as a “secondary”
database file I appended a ‘_secondary’ after my name), and then I selected
<new filegroup > from the dropdown menu. Give your new Filegroup a name (e.g.,
zTest_Secondary).
SQL Server Database Files
8. To create a .NDF file for your database, click the Add button in the Database
Properties windows; this will create a new row in the Database files window
SQL Server Database Files
9. I create a new Database file called zTest_secondary and a new Firegroup called
SECONDARY. If I go look in the MSSQL\DATA directory on the server, I can see my files
C:\Program Files\Microsoft SQL Server\MSSQL11_50.MSSQLSERVER\MSSQL\DATA
SQL Server Database Files
How to Move Database Files to a Different Physical Location
1. Open SQL Server Management Studio (SSMS), and login
2. In the Object Explorer column, expand the Databases directory
3. Right-click on your database (e.g., zTest2013) and select Properties from the
popup
4. From the left side Select a page column menu, click on Files to see the Path of
your current Data files. Write it down or use Control + C to copy the path and
save it for later use (in a text or Word file, whatever you prefer).
5. Back in the main window, detach the database by right-clicking on the
database, then selecting Tasks, then Detach…
SQL Server Database Files
How to Move Database Files to a Different Physical Location
7. A new window will appear. Click on Drop Connections, then the OK button.
You'll notice back in the main window that your database is no longer listed.
SQL Server Database Files
How to Move Database Files to a Different Physical Location
8. Navigate to the location of your MSSQL\DATA files that you wrote or copied
down earlier.
C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA
9. Copy/Move the datafiles to your new physical location (I created a directory on
my E: drive called zDatafiles). Note: I always recommend copying instead of moving in
case the file(s) get corrupted during the move, you will still have the original file(s) to go
back to. Once the copy is successful, you can now safely delete the original files.
SQL Server Database Files
How to Move Database Files to a Different Physical Location
10. Once the files are copied, return to the main window, click on the databases
folder, and choose Attach.
SQL Server Database Files
How to Move Database Files to a Different Physical Location
11. From the new window, click Add.
SQL Server Database Files
How to Move Database Files to a Different Physical Location
12. I kept my original files in the
MSSQL\DATA directory on the C:
drive AND copied them to a new
zDatafiles directory on the E: drive
(just to show how this is done).
Now, I could point to ALL the files
on the E: drive and attach them
there, OR I could reattach my
original MDF file on my C: drive,
then attach my LDF file from my E:
drive, thus separating my MDF/NDF
files from my LDF transaction log
file on two physically different
drives.
Navigate to the location of your
original MDF file and select it.
SQL Server Database Files
How to Move Database Files to a Different Physical Location
13. Return to the main window, and
your database should now be listed
(if not, right-click on the Databases
folder and select Refresh)
Adding a Table to a New Database
1. Right-click on your database, and select New Table…
Adding a Table to a New Database
2. In the Properties table, give your table a pertinent name, then start adding your column
names, data types, etc.
SQL Server FILESTREAM
SQL Server FILESTREAM
How do you create or alter the FILESTREAM files?
To enable and change FILESTREAM settings
1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2008
R2, point to Configuration Tools, and then click SQL Server Configuration
Manager.
2. In the list of services, right-click SQL Server Services, and then click Open.
3. In the SQL Server Configuration Manager snap-in, locate the instance of SQL
Server on which you want to enable FILESTREAM.
4. Right-click the instance, and then click Properties.
5. In the SQL Server Properties dialog box, click the FILESTREAM tab.
6. Select the Enable FILESTREAM for Transact-SQL access check box.
SQL Server FILESTREAM
7. If you want to read and write FILESTREAM data from Windows, click Enable
FILESTREAM for file I/O streaming access. Enter the name of the Windows
share in the Windows Share Name box.
8. If remote clients must access the FILESTREAM data that is stored on this share,
select Allow remote clients to have streaming access to FILESTREAM data.
9. Click Apply.
10. In SQL Server Management Studio, click New Query to display the Query
Editor.
11. In Query Editor, enter the following Transact-SQL code:
EXEC sp_configure filestream_access_level, 2
RECONFIGURE
12. Click Execute.
13. Restart the SQL Server service.
An Introduction to SQL Server FILESTREAM (Simple Talk)
SQL Server FILESTREAM
1. At the top of the Object Explorer, right-click on your SQL Server, select
Properties
SQL BOOKS ONLINE http://technet.microsoft.com/en-us/library/bb933993%28SQL.100%29.aspx
SQL Server FILESTREAM
2. On the left panel click on the Advanced tab, then click on the drop down list
next to Filestream Access Level and select Full access enabled option, then OK
SQL Server FILESTREAM
1.
2.
3.
4.
5.
Create a new database, type a database name
In the left-hand column, click Filegroups
Under the Filestream grid (the lower window), click the lowermost Add button
Add a name, check Default checkbox
Select General from left-hand column, add new database file, select Filestream Data from
File Type, the Filegroup name with auto-fill using filegroup you just created
DO NOT PRESS OK YET, but see instructions on next slide
SQL Server FILESTREAM
6.
Scroll over and set the Path to the drive/directory where the filestream files will be saved,
then OK
Scroll over and set the Path to the drive/directory where the filestream files will be saved, then
OK
http://sqlbeyond.blogspot.com/2011/08/how-to-use-filestream.html
SQL Server FILESTREAM
7.
In SQL Server 2008 and 2008 R2, SSMS does not support FILESTREAM attribute in CREATE
TABLE designer. That means you cannot create a table that has FILESTREAM column by
using SSMS. So TSQL is used to create FILESTREAM enabled table.
CREATE TABLE FSTable
(
-- ROWGUID column is required
[RowId] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
[Id] int NOT NULL,
-- This is FILESTREAM column
[Data] VARBINARY(MAX) FILESTREAM NULL
)
SQL Server FILESTREAM
8. FileStream table can be used or handled just like other table. This means we
can treat FileStream column as normal varbinary column. One can select,
insert, update, delete the same way as other table.
For Information on how to save and retrieve files to and from a filestream
directory, see:
http://weblogs.asp.net/aghausman/archive/2009/03/16/saving-and-retrieving-file-usingfilestream-sql-server-2008.aspx
http://www.mssqltips.com/sqlservertip/1489/using-filestream-to-store-blobs-in-the-ntfsfile-system-in-sql-server-2008/