Preparing for Sematic Search

SQL Server 2012 introduced a number of new features. One of the more interesting is an extension of the full-text search capability, called semantic search or more formally statistical semantic search.

The premise behind semantic search is that statistical analysis of unstructured data or documents, built on top of existing full-text indexes, is used to provide meaning or context to search phrases. With this features we are able to do a number of interesting tasks like extracting relevant terms, search documents for relevant terms and even how similar or related multiple documents are to one another.

With that, there are several installation steps that you must take to prepare for semantic search.

Installing

Step 1: Installing Full-Text & Semantic Search Extractions Feature

This features is installed as part of the SQL Server 2012 set-up. If you selected this option when you set-up your server you can skip ahead to Step 2.

Start by selecting the SQL Server Installation Center under the Configuration folder from Microsoft SQL Server 2012 program group on the Start Menu.

Once the Installation Center starts, choose Installation and then select the New SQL Server stand-alone installation or add features to an existing installation option.

You will be prompted for the installation media and the installer will launch. The steps to get through the installer are the same as you encountered installing your instance of SQL Server 2012. The difference is at the Installation Type step, select the Add features to an existing instance of SQL Server 2012, and then ensure your instance is selected.

On the Feature Selection page, ensure Full-Text and Semantic Extractions for Search is selected and then complete the installation process.

image

Step 2 – Install Microsoft Office 2012 Filter Packs and SP 1

By the default, SQL Server 2012 has the latest word breakers and stemmers, but does not include the latest filters which allow it to understand Microsoft Office documents, PDFs and other Microsoft Office document types. The Office 2012 Filter Packs adds the required filters to support these document types.

Download the Office 2012 Filter Packs and Service Pack 1 from Microsoft and then install the appropriate version based on your system architecture. The links to download the resources are below.

Office 2012 Filter Pack
(32 and 64-bit)
http://www.microsoft.com/download/en/details.aspx?id=17062
Service Pack 1 (32-bit) http://www.microsoft.com/download/en/details.aspx?id=26606
Service Pack 1 (64-bit) http://www.microsoft.com/download/en/details.aspx?id=26604

 

Once the Filter Packs and Service Pack 1 is installed, you will need to force an full-text service update so that it picks up the changes. To  do this execute the following T-SQL:

USE master;

EXEC sp_fulltext_service @action='load_os_resources', @value=1;
EXEC sp_fulltext_service 'update_languages';
EXEC sp_fulltext_service 'restart_all_fdhosts';

You can confirm the install was successful by running the following T-SQL to view document type mapping (componentname):

EXEC sp_help_fulltext_system_components 'filter'

 

Step 3 – Install, Attach & Register the semantic language statistics database

The basis of semantic search functionality is statistics. This base data set used for the statistical analysis perform is not installed by default so we must manually install, attach and then register the required database.

To start, find the  SemanticLanguageDatabase Windows Installer in the x64/Set-Up (or x86, if appropriate) folder on the SQL Server 2012 installation media. Run the installer to extract the semanticsDB file. By default, the installer puts the database in the C:\Program Files\Microsoft Semantic Language Database folder, although you can select a more appropriate location as the situation may require. Once the database has been extracted, we are ready to attach and then register it. To do perform these steps you can execute the following T-SQL, making sure the path to both the MDF and LDF file is correct:

CREATE DATABASE SemanticsDB
            ON ( FILENAME = 'C:\Semantic Language Database\semanticsdb.mdf' )
            LOG ON ( FILENAME = 'C:\Semantic Language Database\semanticsdb_log.ldf' )
            FOR ATTACH
GO

EXEC sp_fulltext_semantic_register_language_statistics_db @dbname = N'SemanticsDB'
GO

With that you have prepared your server instance for Semantic Search. From here you can create a default full-text catalog and full-text index enabled with the Statistical_Semantics option to begin using this feature. We will explore using the Semantic Search and the new FileTable type in the next blog.

Till next time!

Chris

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s