Jacking up Data – the Rig of Dimensional Date and Time Information

Most analytical environments use dates for registering when specific transactions have occurred in an OLTP system, such as when a sale has been made of a specific product from a certain category, when a customer has registered a complaint with a company help desk line, or when a user has sign up for a specific event held in a specific city.

Date and time dimensions enhance the value proposition of providing numeric details associated with a particular transaction. In an OLTP environment, dates and and times are stored as signifiers of a particular event in time. In an OLAP environment, where users query data extracted, transformed and loaded into an analytical system, data and time dimensions augment the facts; that is they add additional informational benefits to a specific event in terms of the range of capabilities associated with, say slicing and dicing facts based on those dates and times.time-2132452_640

Below is a typical date and time used in a Microsoft SQL Server database environment: 2017-03-22:12:15:30. We can all safely assume that means March 22, 2017 at fifteen minutes past noon, or more specifically fifteen and a half minutes past noon. From an online transaction processing perspective, this is a very useful piece of information and some a fair bit of detail. From a standard, static reporting vantage point, this information may suffice for users who need to know just the date and time when transactions occurred. While the date component may also be universally recognized, meaning that someone in Canada and France understand it as March 22, the subtlety lies in the fact that a user in France may want the textual name of the month and day written in their language.

With a little extra work setting up the date and time dimension table in an analytical environment, user experience and preference requirements can be achieved with some high value propositions built into the data presented in a more advanced, dynamic way.

Knowing the user requirements for an advanced, or business intelligence project dependent on dates and times is key to understanding what users need in terms of not only what data is presented but also how the date is presented to them.

Language requirements are critical for companies that have offices spanning the globe or have a reach beyond a single country; critical in the sense that users in any country should be able to work – or in this case view data – in the language of their choice, as much as possible. In this case the realm of language requirements goes beyond just the interface in which the data is presented to users, whether that interface is Excel, Power BI, or SQL Server Reporting Services. The realm broadens to providing data in specific languages too.

city-1940691_640Dimensions for dates and times can be as wide and deep as needed; wide in terms of the number of attributes or columns need to store dimensional details about dates and deep in terms of how far back and ahead the date dimensions should go. While the task of creating date and time dimensional data in the future is easy, the business analysis activity should determine via data reviews how far back to create older dimensional data. For example, if a company is analyzing product sales only and their first sales ever was 1988, then you most likely only need to provide date dimensional details starting as far back as January 1, 1988. Keep in mind though that a single date and time dimensional table in an analytical environment can be the source for a multitude of fact tables within a single BI realm. Start where needed, but plan for updating the dimension as more and more data is targeted for loading into the analytical environment.

Conceptual example

Modelling the requirements that drive what a date and time dimensional table are keys to not only understanding business parameters but properly documenting the data needs. A date model may look like this if hashed out using a whiteboard or doodle pad:

null

Figure 1 – Whiteboard rendition of conceptual date dimension

Modelling the example – date dimension

Next, we might spend a few hours modelling the requirements using Universal Markup Language (UML) data modelling. We assume that we have confirmed and documented as stated requirements:

  • That no other language requirements other than English and French needed
  • The database technician has stated that a unique key is required to support distinct records in the date dimension table
  • We need to create date attributes as far back as 1988.
  • For now, creating date dimensions into 2020 will suffice. Note: the database technician will know how to do this. For now, the requirement is to be able to create date dimensions for the next 3 years or so which can be achieved automatically using functions in SQL Server.

I used Visio for developing the initial draft of the UML date dimension model, which is displayed below.

null

Figure 2 – Translating whiteboard requirements into UML data model

A few things to notice in the UML diagram. Visio automatically made the “datekey” attribute a primary key. The primary key is the unique identifier in the date dimension table. Also, I didn’t spend time on determining the data types and their lengths in the model. In this case, my focus is on the requirement and I will leave the types and length to the database technician to implement. There is no need here to consume valuable business analysis resources on this aspect of the model.

I, as IT business analysis and database administrator would develop the physical date dimension table with the cross section of it shown below from a sample of year 2000 records:

null

The Time Dimension

Up to this point we have spent time on the date dimension. What about the time dimension? Having a time dimension depends on the business. Some people do not care about time dimensions while others are fanatical about them. Ask the key stakeholders about this need. For now we will assume that time dimension is required, as our stakeholders are analytical enthusiasts! Next are the questions we should be asking of the key stakeholders, including the technical team:

  • Why is a time dimension required in the first place? In other words, what value does a time dimension bring to the analytical environment?
  • Do we use the 12 or 24 hour system?
  • So how do we conceptualize and model a time-based dimension?
  • What is the impact on the date dimension in terms of the number of records needed to be created within the date dimension table to support records that have different timestamps but the same date?
  • Is there some kind of link that we can establish to keep the records tidy?

The value of a time dimension

When thinking about using time dimensions, consider the business drivers for it. Ask executive, management and operational users how they see time dimensions being used as part of their analytical environment. Even if the discussion leans towards not using, consider ways to introduce the benefits of using a time dimension. Consider the case study below.

The marketing team has put together what they consider a game changing program in support of a new product being shipped by company ABC. The advertising team spends a large pool of money in support of the marketing program. The CIO wants to know not only when the products gets purchased but within how many seconds, minutes or hours customers buy the product from the moment the CEO makes the go-live decision with the program. The CIO wants this information to assess the length of time between marketing a product and consumers buying it at a deeper level than simply the date on which they make the buy.

Working with time – formats and concepts

Going back to our example, we might have source dates extracted from our transactional system that look like this, assuming a 12 hour format:

  • 2017-03-22:12:15:30PM
  • 2017-03-22:12:18:40PM
  • 2017-03-22:06:16:10PM

In our date dimension table, the date would be represented as 20170322 and is used at the date key. Here we have 3 time stamps, which mean 3 different records in the time dimension. These time records could be represented in our analytical system like this:

  • 121530, 121840 and 181610

If we put on our critical thinking caps, we know the time dimension falls between 00:00:01 and 23:59:59 so the number of records is bounded by this range. So we conceptualize and model our time dimension the same way as we do for the date dimension, only we have a narrower pool of records with which we need to work.

Working with a 12-hour calendar shortens the number of records thus calling for a parameter to identify AM or PM. While the 24-hour table would contain almost 236,000 records, its volume might deter database administrators from using up vital disk space. While a small number, the business requirements elicitation process will yield guidance on what to do, revealing either stated requirements or a policy decision from the CIO or key database administrator.

Our time dimension fact table might look like this:

Unique ID

Time

121530

121530

121480

121480

61610

61610

In this example, it is ok to use the same number for the unique identifier and time attribute.

The extraction process from the transactional system would split out the date from the time part. In terms of the date dimension table there is little impact from our earlier modelling. But we need to be able to associate a date with a time in the analytical environment. Part of the extraction, transformation and loading is assumed to have created this association. In the analytical environment, we link two dimension tables together using a factless fact table. In this case the factless table, as built by the database administrator or BI professional will consist of the date unique key and the unique key for the time. Let’s use an example using the same timestamp but a different date for our case study.

Factless ID

Date Unique Key

Time Unique Key

1

20170322

121530

2

20170322

121840

3

20170322

61610

4

20170323

121530

5

20170323

121920

I added a few extra records to show how the date unique key can be associated with multiple time unique keys. And as you can see, the time key is the same for a purchase made on March 22nd and 23rd. What are the odds of this happening? Rare but the situation could arise if the site has a lot of user traffic and e-commerce purchasing volume. Think of Fedex or one of the other logistics companies. Online transactions happen almost every second. For now, we are illustrating the concept.business-woman-2071342_640

The impact on the date dimension is nil. The same number of records remains the same. We link the time and date dimensions together using a factless fact table.

Date and time dimensions play an important role in the business intelligence project. The data is structured in such a way that it can provide leaders with the insights into what is happening with their programs and operations that allows them to make better decisions with the data they have at hand.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s