Data Quality Services is a new and powerful feature that is available in SQL Server 2012. Called a knowledge-driven data quality product, DQS allows you to build knowledge bases that handle the traditional data quality tasks such as profiling, correction, enrichment, standardization and de-duplication. In this blog series we will dive into DQS and explore how its numerous features and capabilities can improve and enrich your critical and valuable business data.
DQS Blog Series Index
Part 1: Getting Started with Data Quality Services (DQS) 2012
Part 2: Building Out a Knowledge Base
Part 3: Knowledge Discovery in DQS
Part 4: Data Cleansing in DQS
Part 5 : Building a Matching Policy in DQS
Part 6: Matching Projects in DQS
Part 7: Activity Monitoring, Configuration & Security in DQS
Installing the DQS Server and Data Quality Client
Data Quality Services consist of two components: DQS Server and the Data Quality Client. Both of these components are install by the Data Quality Server Installer.
To install the DQS server and client, select ‘Data Quality Server Installer’ shortcut from the the Microsoft SQL Server 2012 RC0/Data Quality Services folder on the All Programs menu of the Start button.
Once the installer starts, you will be prompted to enter a password for the database master key. This key will be used to encrypt the contents of the DQS databases.
The process to install the DQS components may take several minutes. During this process three databases and an out of the box knowledge base is created. The three databases that are created are:
- DQS_MAIN – This database contains all the DQS stored procedures, the DQS engine and published knowledge bases
- DQS_PROJECTS – Contains the data associated with data quality projects created in the Data Quality Client
- DQS_STAGING_DATA – As the name implies, this is a staging area where you can both copy data to perform DQS operations on it as well as export processed data from.
The installation process also creates several DQS server logins (##MS_dqs_db_owner_login## and ##MS_dqs_service_login##) and DQS database roles (dqs_administrator, dqs_kb_editor and dqs_kb_operator). To handle DQS initialization, a stored procedure is created in master database. It should also be noted that is the installer can find a Master Data Services database instance on the same service it create a user and map it to the MDS login and then grant administrator access to the DQS_MAIN database.
Once the installer finishes, you will be prompted to press any key to exit and the installation is complete.
- The Microsoft.Net Framework 4.0 is required to run the Data Quality Client.
- To login to the Data Quality Client a user must be in one of the DQS roles. If a user is in the sysadmin server role, it is not necessary to add them to the DQS roles.
- If you are running the client on a separate computer, TCP/IP must be enabled on the instance hosting the DQS server.
First Look at the Data Quality Client
To begin working with DQS open the Data Quality Client from All Programs menu on the Start button. When the application launches you will be prompted to enter the server name of instance which host your DQS server. Before you click ‘Connect’, note that you have an option to ‘Encrypt connection’ which will use a SSL connection for the communications between the client and server.
Once you are connected you will notice, three distinct areas: Knowledge Base Management, Data Quality Projects and Administration. We will dive deeper into these areas in the next blog post but I just want to cover some of the high level concepts that are important in the DQS world.
- Knowledge Base – collection of data domains
- Domain – contain domain values and status, domain rules, term-based relations, and reference data. Domains can either be single or composite
- Knowledge Discovery – analyzes organizational data to build knowledge that can be used in cleansing, matching and profiling
- Cleansing – Process of using the a knowledge base to propose data corrections
- Matching Policy – Rules used to perform data de-duplication. These rules can be fine tuned by matching results and profiling that creates additional matching policies.
- Reference Data – Data that can be used to validate and enrich your data. Reference data providers are available in the Azure Marketplace Data Market or you have the option of connecting directly to your provider.
In the next blog post we put DQS to use by building out a knowledge base with domains and business rules, run through the data discovery process and then build out a matching policy.
Till next time!!