There are many no-SQL databases on the market today and serve many purposes.  I often think about this stuff while having my Starbucks coffee in the morning.  There are many napkins upon which I write down thoughts, and sometime draw out little data models to take back to the office.  There, I said it, I am a bit old school!

organic-1280537_640As a business analyst in the information technology field, I need to stay up-to-date on database trends and advances.  I also need to understand how to work with these unstructured database environments.  One key aspect of this understanding is assessing under what business conditions and circumstances it is valuable to use a no-SQL database.

Today’s post will focus on the details about what a business analyst should know about how to approach projects that require some modelling of a no-SQL database solution. While the specific details of the no-SQL database is less pertinent to a BA and remains the domain of the technical experts who develop the physical database, the concepts about the no-SQL environment, terminology, and modelling approaches is important using a competency and perspective approach.

Ultimately, in the end, the solution has to make data available to business users in order to give them allow them to make better decision with that data, assess how their operations are performing or even gather static information about customers.  Making better decisions is the core to analytics.  If you watch baseball like I do, you will often hear about how analytics plays a role in the offensive and defensive aspects of a game.  Analytics plays a big role in the strategy and tactics employed by a baseball team.

I will assume that we have already met with stakeholders to learn their strategic objectives and what they want out of the data that is collected today.  Our approach next is how to model the no-SQL repository.  Planning and analysis into the design of the no-SQL environment will minimize problems later.  macbook-336704_640

One concept to understand is that no-SQL databases use a term called documents.  Documents are part of collections in the no-SQL environment.  Think of a collection called “People”.  Inside this collection will be either one, or many “documents” which store details about people. Some thought must go into the modelling of how each collection will be structured.

Below is an example of a People collection and a document within the collection, as it would appear to a no-SQL person:

People: {
“_id”: “1234567”
“firstName”: “John”
“lastName”: “Doe”
address”: [
      “street”: “123 Avenue”,
“city”: “Some Town”,
      “state”: “Some State”],
“interests”: “I enjoy watching baseball and understanding analytics used in the game”
}

As you can see from the above, the People collection is designed to have fields hold single values and multiple values (the address array).  The ID field is auto-generated and is shown for completeness.  The requirements gathering process should highlight that unique identifiers be auto-generated and handled by the no-SQL database.

What powers a no-SQL environment is that the fields can store values, including empty strings that are not necessarily used or required in other documents in the People collection.  Consider the following second People document:

People

{
“_id”: “1234567”,
“firstName”: “Jane”,
“lastName”: “Doe”,
“nickname”: “JD”,
address”: [
“street”: “456 Avenue”,
“city”: “Some Other Town”,
“state”: “Some Other State”,
“country”: “Some country”],

“interests”: “I enjoy hockey.”
}

Putting the two together yields the following:

People

{
“_id”: “1234”
“firstName”: “John”
“lastName”: “Doe”
“nickname: “”
address”: [
      “street”: “123 Avenue”,
“city”: “Some Town”,
      “state”: “Some State”,
      “country”:”Some county A”]
“interests”: “I enjoy watching baseball and understanding analytics used in the game”
},
{
“_id”: “4567”,
“firstName”: “Jane”,
“lastName”: “Doe”,
“nickname”: “JD”,
address”: [
“street”: “456 Avenue”,
“city”: “Some Other Town”,
“state”: “Some Other State”,
“country”: “Some country”],

“interests”: “I enjoy hockey.”
}

As you can see, Jane Doe uses a nickname and John Doe does not.  This example would require a discussion about which data fields on the form used to collect the information should be mandatory.  In this case, the nickname field would remain open and not require the user to fill in information.

Another consideration is whether addresses will be stored within the People collection or as a separate Address collection.  Discussions with the technical experts will shed some light on best practices, as the no-SQL database administrator is a key stakeholder in the project too.  In other words, how are collections to be related?

home-office-336373_640Often collections will be used to store related data and being able to link the collections together may come up during requirements sessions.  Unique identifiers can be used to register unique documents inside a collection.

So while we are having coffee we can take out another napkin and update our collection, keeping in mind too that the naming convention of fields in the no-SQL database should be thought out.  Nothing is more frustrating that looking at fields in a database that make no sense or is hard to decipher the intent of the field and what data it is to store.

People

{
“_id”: “1234”
“firstName”: “John”
“lastName”: “Doe”
“nickname: “”,
“interests”: “I enjoy watching baseball and understanding analytics used in the game”
},
{
“_id”: “4567”,
“firstName”: “Jane”,
“lastName”: “Doe”,
“nickname”: “JD”,
“interests”: “I enjoy hockey.”
}

Addresses

{
“person_id”:”1234″,
“address”: [
“street”: “123 Avenue”,
“city”: “Some Town”,
“state”: “Some State”,
“country”:”Some county A”]
},
{
“person_id”:”4567″,
“address”: [
“street”: “456 Avenue”,
“city”: “Some Other Town”,
“state”: “Some Other State”,
“country”: “Some country B”]
}

In this revised model example, the Address collection is separate and links back to the People collection using the “person_id” field.  Updating the address information is easier than updating an entire People collection.  Smaller is often better in this case, putting less stress on the system to find, update and save data in collections.

I hope this overview of no-SQL collections, documents an coffee-shop data modelling gives you a little perspective about how these databases can be logically and physically designed.

The ease with which we make the collections and storage of document data will make extraction and visual presentation of the data easier in formats that are required by the business in order for them to make better decisions with the information at their disposal.

Leave a Reply

%d bloggers like this: