#NoSQL Primer

In the buzz word littered technology landscape one that should stand out above the rest for SQL Server data professionals is NoSQL. You by now, have without a doubt heard it, but it may not know what it means and how its altering the landscape we live and breathe in. If you have heard about it, maybe you have one of the many misconceptions as to what its all about.

In this this blog post we will introduce the fundamental concepts behind the NoSQL movement including an explanation of the four most common types of NoSQL databases. Over the next several post, we will compare and contrast NoSQL databases with there relational counterparts while exploring how they can be meshed together into a single unified solution. Along the way we will attempt to dispel some of the common misconceptions associated with NoSQL as gain an understanding of where it fits in the overall enterprise data architecture.

Behind the Movement

To understand the driving force behind the NoSQL movement, two concepts must first be discussed. The first and the one that you are most likely intimately familiar with is the principal of ACID transactions. Almost all modern relational databases guarantee that transactions completed will be atomic, consistent, isolated and durable. These traits or characteristics are not only highly desirable but are often a critical part of systems such as those found in the financial, healthcare and many other industries. In fact, the ACID principal has been the foundation upon which most application development to date is based. But a evolution is underway.

This is where we introduce our second concept, the CAP theorem. The CAP theorem suggests that distributed systems can at any given time only achieve two of the following: consistency, availability and partition tolerance. Since relational databases by their nature are consistent, according to the CAP theorem you are left to choose which is more important availability or partition tolerance and for decades this served us very well.

A fundamental shift however, is underway. The growth of the internet and degree to which we as human being are connected has alter the way modern applications are accessed and used, thus forcing a different application development paradigm. Highly centralized applications with mammoth databases running on expensive hardware that does not easily and cheaply scale are being replaced by widely distributed applications that are often run from the cloud and have databases that are distributed across the globe. Meaning that instead of consistency, these new application value availability and partition tolerance more highly. This presents a series of new challenges for which relational databases are not well (at least easily) prepared and is the driving force behind the NoSQL movement.

NoSQL  = Non-Relational

NoSQL is a misnomer that is trips up a number of people and is often misunderstood. In fact even included SQL in the name can be confusing since a number of NoSQL platforms support either SQL or SQL-like queries. To better define the term, consider NoSQL databases as simple Non-Relational. To clarify this let’s start by defining a relational database.

A relational database is set theory based where tables are limited to two-dimensions. Tables are organized into columns and rows and consists of a fixed-schema meaning that if a column is defined with a specific data type (like an integer or date) then for ever row in that table the value contained in that cell is guaranteed to follow the defined schema. Transactions in relational databases also follow the ACID principal as previously discussed. This as you well know works very well in most instances where data is highly structured and largely uniform through-out.

Non-relational databases on the other hand are more loosely structured and often exists without schemas. This means that not only can data vary from row to row, but even the two-dimensional notion that is so familiar in the relational world often does not apply. These differences and the inherent lack of structure makes these database very flexible and ideal for storing extremely large distributed data sets that contain a variety of often inconsistent data.

In terms of transaction support, these non-relational databases often forgo the ACID principal in favor of what’s known as “eventual consistency”, meaning that eventually the most current value will be propagated out. This introduces a new wrinkle into the application development process as does how this data is queried or accessed.

Type of NoSQL Databases

To this point we have painted the various non-relational databases with the single broad NoSQL brush. This is somewhat inaccurate since there is a wide-variety of databases, each with its own strengths and weaknesses. To more accurately describe the various types of non-relational databases we can divide them into four categories or types: key-value pair stores, document databases, column-oriented databases and finally graph databases.

Key-Value Pair Stores

Key-value pair stores are the simplest of the non-relational databases. This database is essentially a large hashed set or map of values that have uniquely identifiable keys. These databases typically do not know or care what is stored in the value and the contents are often represented as blobs giving you no visibility into its actual contents. This makes the functionality of these databases fairly limited since there is not updatability and requires that the database only need to support basic query, insert and delete operations. Examples of these types of databases include: Azure Blob Storage, Azure Table Storage, Dynamo DB, Redis, and Riak.

Document Databases

Document databases are in a lot of ways very similar to key-value pair stores. Each document, is uniquely identifiable by a key, and instead of a blob-like or free-form text value, the documents are well structured using an encoding such as an XML or JSON. The document database management system understands these documents as collections of fields and is capable of both querying and update them directly in addition to simply retrieving them by key. To better visualize consider the following example:

image

In this example, we have a document table that is storing historical XML-formatted sales order data from the Adventure Works database. A document database system could easily query this information either by the key or by a specific value contained within the document (such as finding all sales orders that were place for a Touring-1000 Yellow, 46 bike). Examples of these types of databases include: Mongo DB, Couch DB and Raven DB.

Column-Oriented Databases

Like traditional relational database systems, column-oriented databases consist of rows and columns. Relational databases store data in a row-oriented fashion, meaning that all data values for a give row are stored together. This flips for column-oriented databases which instead organize data by column. Consider the following example containing an address table:

When the data is persisted to storage, each record is serialized together as discreet units and logically would look as follows:

image

A column-oriented database, instead serializes each column together and as a result looks much different from the preceding example.

Storing data in this fashion allows column-oriented databases to handle common database and ad-hoc type queries that perform aggregation over very large sets of data to run very efficiently. Additionally, this databases can get very, very big since data is organized by column and can be easily partitioned in a process called sharding. In situations such as these, columns or organized into column families or groups to keep related columns together (think a group of columns for a customer address or details like customer name. Examples of these types of databases include: HBase, HBase on Azure and Cassandra.

Graph Databases

Unlike the previous NoSQL database flavors we have discussed previously a Graph database is solely focused on capturing and storing relationship data. This feat is accomplish through the notion of nodes and edges, where nodes represent entities of one or more types and edges define the relationship between entities. Properties are also an integral part of these databases and can be captured on either the node or the edge. One example of this type of database is Neo4j.

 

Wrapping it Up

I hope this brief intro has served as a useful introduction to the NoSQL landscape. Over the next couple of posts we will dig deeper and explorer how we can leverage each of these databases to deliver modern, best-in-class solutions. Remember, however, that NoSQL and the relational database worlds are not mutually exclusive. It’s not (and shouldn’t be) one or the other. Rather, the answer to complex business challenges will often be both, allowing you to leverage each technologies strength where appropriate.

Till next time!

Chris

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s