Juan Soto - Adolfo J. Socorro | Databases, Data Analysis
Download
Report
Transcript Juan Soto - Adolfo J. Socorro | Databases, Data Analysis
Introduction to Full-Text
Searching in SQL Server 2012
Adolfo J. Socorro, Ph.D.
IT Impact, Inc.
[email protected]
Outline
What
can we do with FTS?
How to install FTS
FTS components
Creating FTS indexes
How to query with FTS
FILESTREAM and FileTable
FTS Basics
FTS allows searching against character-based data
char
ntext
varchar
image
nchar
xml
nvarchar
varbinary
text
varbinary(max)
Search Functionality
Specific words or phrases
• “hotel” => “hotel”
Prefixes
• “fan”
=> “fantastic”, “fantasy”
• “local store” => “locally stored”
Inflectional forms
• “minimized” => “minimizing”, “minimise”
Search Functionality
Proximity
• “search,query” => “query to perform search”
Synonyms
• “folder” => “directory”
Weighted Values
A First Look
Let’s
run some simple examples to get a feel
for FTS!
LIKE vs FTS
LIKE
works on character patterns only
Cannot
use the LIKE predicate to query
formatted binary data
FTS
is much faster against large amounts of
unstructured text data
Supported SQL Server Editions
Enterprise
Business
Intelligence
Standard
Web
Express
with Advanced Services
Available since at least SQL Server 2000
FTS Components
Word
Breaker
Thesaurus
Stemmer
Stoplists
Filters
Property
Lists
Language Support
50+
languages
Language-specific
components
Word breakers and stemmers
Stoplists
Thesaurus files
How to Install
Default FTS Language
FTS Indexes
One
index per table or indexed view
Must
have a unique, single-column, nonnullable index on the table
Grouped
within the same database into one or
more full-text catalogs (“containers”)
Full-Text Catalogs
A
logical construct
A
way to manage FT indexes together
Index Population
Population:
indexes
the addition of data to full-text
Automatic
Manual
• On Request
• Scheduled
Steps to Setup an Index on a Table
Create Full-Text Catalog
For Each Column to Index
• Indicate language
• Indicate document type *
Choose Change-Tracking
Mechanism
Full-Text Index Wizard
Example: Create Catalog and Index
CONTAINS
Precise
or prefix matches to single words and
phrases
Proximity matches
Logical operations between conditions: AND,
OR, AND NOT
Optional use of inflectional forms and
thesaurus
FREETEXT
Matching
the meaning, but not the exact
wording, of specified words or phrases
Always
uses inflectional forms and thesaurus
CONTAINSTABLE AND FREETEXTTABLE
Return
a relevance ranking value (RANK) and
full-text key (KEY) for each row
The
actual RANK values are unimportant and
typically differ each time the query is run
ISABOUT/WEIGHT
influence the ranking
in CONTAINSTABLE
Example: Queries
Stoplists
A
mechanism to discard commonly occurring
strings that do not help the search
a
is
the
by
and
…
Thesaurus
Nicknames:
Robert/Bob
Common misspellings: calendar/calender
Homophones: Geoff/Jeff
Technical terms: proc/procedure
Very powerful if you log searches and learn
what users are commonly searching for
Thesaurus
One
file per language
Expansions
Replacements
“bike”
in addition to
“bicycle”
“calendar”
instead of
“calender”
Filters
Extract
textual information from the
document (removing the formatting)
Send
the text to the word-breaker
component for the language
associated with the column
Need
to manually install Office 2010
and PDF filters
Example: FTS Components
Where to Store Large Objects?
Database
File System
Why Store in the Database?
Integrating
unstructured data into the
relational database provides significant
benefits:
Integrated storage and data management
capabilities (e.g., backup)
Ease of administration and policy
management
Full-text search
FILESTREAM
A
database/file system hybrid
FILESTREAM
is an attribute that can be
assigned to a varbinary(max) column
Allows
Not
storing BLOB data in the file system
restricted to the 2 GB limit
SQL Server imposes on BLOBs
FILESTREAM
SQL
Server buffer pool is not used
Isolation
semantics are governed by
Database Engine transaction isolation
levels
Steps to FILESTREAM
Enable at OS level
Configure at instance level
Create a filegroup
Add a file to the filegroup
• Indicate root folder
OS-level Configuration of
FILESTREAM
Instance-level Configuration of
FILESTREAM
Example: FILESTREAM
FILESTREAM
All
data access must be transactional
Must
Do
use specific APIs for file I/O
not edit the files directly!
When to Use FILESTREAM
Objects
that are being stored are, on
average, larger than 1 MB
Store smaller objects in the database
Fast
You
read access is important
are using a middle tier for
application logic
FileTables
A
special, fixed-schema kind of table
Builds
on top of existing FILESTREAM
capabilities
Store
files and documents in in the
database, but access them from
Windows applications as if they were
stored in the file system (WIN32 API)
FileTables
Hierarchical
Includes
namespace
file system properties as columns
Preserves
full file names
Non-transactional
access through the FS
FileTables
Calls
to create or change a file or directory
through the Windows share are intercepted by
a SQL Server component and reflected in the
corresponding relational data in the FileTable
Example: FTS over FileTables
FileTables vs FILESTREAM
File
and directory hierarchy maintained in the
database
Windows
application compatibility
Relational
Both
access to file attributes
are available in all editions
Wrap Up
Advanced
searching on character-based data,
including documents
FTS
setup, components, and queries
FILESTREAM
FileTables
Other Topics
Document-property
Semantic
search
search
Optimizations
Query plans and execution traces
References
Posts
http://www.sqlskills.com/blogs/bobb/
Blog:
SQL Server FTS Team Blog
http://blogs.msdn.com/b/sqlfts
SQL
and presentations by Bob Beauchemin
Server 2012 Books Online
http://msdn.microsoft.com/enus/library/cc645577(SQL.110).aspx
Filter Packs
Adobe
http://www.adobe.com/support/downloads/thankyo
u.jsp?ftpID=4025&fileID=3941
Office
PDF Filter
2010 Filters
http://www.microsoft.com/enus/download/details.aspx?id=17062