Juan Soto - Adolfo J. Socorro | Databases, Data Analysis

Download Report

Transcript Juan Soto - Adolfo J. Socorro | Databases, Data Analysis

Introduction to Full-Text
Searching in SQL Server 2012
Adolfo J. Socorro, Ph.D.
IT Impact, Inc.
[email protected]
Outline
 What
can we do with FTS?
 How to install FTS
 FTS components
 Creating FTS indexes
 How to query with FTS
 FILESTREAM and FileTable
FTS Basics
FTS allows searching against character-based data
 char
 ntext
 varchar
 image
 nchar
 xml
 nvarchar
 varbinary
 text
 varbinary(max)
Search Functionality
Specific words or phrases
• “hotel” => “hotel”
Prefixes
• “fan”
=> “fantastic”, “fantasy”
• “local store” => “locally stored”
Inflectional forms
• “minimized” => “minimizing”, “minimise”
Search Functionality
Proximity
• “search,query” => “query to perform search”
Synonyms
• “folder” => “directory”
Weighted Values
A First Look
 Let’s
run some simple examples to get a feel
for FTS!
LIKE vs FTS
 LIKE
works on character patterns only
 Cannot
use the LIKE predicate to query
formatted binary data
 FTS
is much faster against large amounts of
unstructured text data
Supported SQL Server Editions
 Enterprise
 Business
Intelligence
 Standard
 Web
 Express
with Advanced Services
Available since at least SQL Server 2000
FTS Components
Word
Breaker
Thesaurus
Stemmer
Stoplists
Filters
Property
Lists
Language Support
 50+
languages
 Language-specific
components
 Word breakers and stemmers
 Stoplists
 Thesaurus files
How to Install
Default FTS Language
FTS Indexes
 One
index per table or indexed view
 Must
have a unique, single-column, nonnullable index on the table
 Grouped
within the same database into one or
more full-text catalogs (“containers”)
Full-Text Catalogs
A
logical construct
A
way to manage FT indexes together
Index Population
 Population:
indexes
the addition of data to full-text
Automatic
Manual
• On Request
• Scheduled
Steps to Setup an Index on a Table
Create Full-Text Catalog
For Each Column to Index
• Indicate language
• Indicate document type *
Choose Change-Tracking
Mechanism
Full-Text Index Wizard
Example: Create Catalog and Index
CONTAINS
 Precise
or prefix matches to single words and
phrases
 Proximity matches
 Logical operations between conditions: AND,
OR, AND NOT
 Optional use of inflectional forms and
thesaurus
FREETEXT
 Matching
the meaning, but not the exact
wording, of specified words or phrases
 Always
uses inflectional forms and thesaurus
CONTAINSTABLE AND FREETEXTTABLE
 Return
a relevance ranking value (RANK) and
full-text key (KEY) for each row
 The
actual RANK values are unimportant and
typically differ each time the query is run
 ISABOUT/WEIGHT
influence the ranking
in CONTAINSTABLE
Example: Queries
Stoplists
A
mechanism to discard commonly occurring
strings that do not help the search
a
is
the
by
and
…
Thesaurus
 Nicknames:
Robert/Bob
 Common misspellings: calendar/calender
 Homophones: Geoff/Jeff
 Technical terms: proc/procedure
Very powerful if you log searches and learn
what users are commonly searching for
Thesaurus
 One
file per language
Expansions
Replacements
“bike”
in addition to
“bicycle”
“calendar”
instead of
“calender”
Filters
 Extract
textual information from the
document (removing the formatting)
 Send
the text to the word-breaker
component for the language
associated with the column
 Need
to manually install Office 2010
and PDF filters
Example: FTS Components
Where to Store Large Objects?
Database
File System
Why Store in the Database?
 Integrating
unstructured data into the
relational database provides significant
benefits:
 Integrated storage and data management
capabilities (e.g., backup)
 Ease of administration and policy
management
 Full-text search
FILESTREAM
A
database/file system hybrid
 FILESTREAM
is an attribute that can be
assigned to a varbinary(max) column
 Allows
 Not
storing BLOB data in the file system
restricted to the 2 GB limit
SQL Server imposes on BLOBs
FILESTREAM
 SQL
Server buffer pool is not used
 Isolation
semantics are governed by
Database Engine transaction isolation
levels
Steps to FILESTREAM
Enable at OS level
Configure at instance level
Create a filegroup
Add a file to the filegroup
• Indicate root folder
OS-level Configuration of
FILESTREAM
Instance-level Configuration of
FILESTREAM
Example: FILESTREAM
FILESTREAM
 All
data access must be transactional
 Must
 Do
use specific APIs for file I/O
not edit the files directly!
When to Use FILESTREAM
 Objects
that are being stored are, on
average, larger than 1 MB
 Store smaller objects in the database
 Fast
 You
read access is important
are using a middle tier for
application logic
FileTables
A
special, fixed-schema kind of table
 Builds
on top of existing FILESTREAM
capabilities
 Store
files and documents in in the
database, but access them from
Windows applications as if they were
stored in the file system (WIN32 API)
FileTables
 Hierarchical
 Includes
namespace
file system properties as columns
 Preserves
full file names
 Non-transactional
access through the FS
FileTables
 Calls
to create or change a file or directory
through the Windows share are intercepted by
a SQL Server component and reflected in the
corresponding relational data in the FileTable
Example: FTS over FileTables
FileTables vs FILESTREAM
 File
and directory hierarchy maintained in the
database
 Windows
application compatibility
 Relational
 Both
access to file attributes
are available in all editions
Wrap Up
 Advanced
searching on character-based data,
including documents
 FTS
setup, components, and queries
 FILESTREAM
 FileTables
Other Topics
 Document-property
 Semantic
search
search
 Optimizations
 Query plans and execution traces
References
 Posts

http://www.sqlskills.com/blogs/bobb/
 Blog:

SQL Server FTS Team Blog
http://blogs.msdn.com/b/sqlfts
 SQL

and presentations by Bob Beauchemin
Server 2012 Books Online
http://msdn.microsoft.com/enus/library/cc645577(SQL.110).aspx
Filter Packs
 Adobe

http://www.adobe.com/support/downloads/thankyo
u.jsp?ftpID=4025&fileID=3941
 Office

PDF Filter
2010 Filters
http://www.microsoft.com/enus/download/details.aspx?id=17062