The advent of business intelligence opens up unique opportunities for small, medium and large organizations. Poll any leadership team about why their company uses business intelligence and you could receive a wide range of answers. Some of those answers may include a need to discover new markets, identify trends in existing data, support better decision making in line with strategic goals or objectives, achieve better forecasting and budgeting with financial data, or simply to improve existing business processes.
Data presented to end users in the context of business intelligence allows them to take better action with the results. Until the advent of business intelligence software and related tools, users and the technology team supporting the business units depended, and perhaps still depend on static reporting as the norm for looking at data and making decisions based on that data. While these reports feature traces of advanced functions like pivot tables and charting methods, less than optimal decisions are made with static reports as issues like trend analysis, perspectives and key performance indicators are missing to satisfy the analytical demand made by management and leadership teams.
Often, reports are generated based on existing transactional system data which contain inconsistent data formats, bad data and a mixed array of alphanumeric text that either cannot be easily analyzed or leaves the user with little ability to assess what should be real text – and thus descriptive in nature – to what should be represented as a number. Today, looking at cleaned data and drilling down into that data yields a wider spectrum of decision making capabilities using a system designed for analytical processing.
We will look at how to support BI projects through the lens of the business and what requirements to think about by taking a look at a strategic level, then how analysis of needs are documented and augmented by modelling tools, peek at transitional thinking and concluding with key messages.
Research conducted by Thomas Davenport, Jeanne Harris and Robert Morison, knowledge leaders in the field of business analytics, found that forty percent of decisions made by managers are taken from experience or gut feeling and not by looking deeper into what the data is telling them. This could be for a number of reasons: insufficient facts influencing management to avoid using reports as background, corruption of data leading to a lack of faith in the integrity of the data source(s), erosion of trust in the reporting mechanism or simply the reliance on similar, past decisions as indicators of success.
To understand these reasons and to weed out contributing factors, a schedule of meetings with key stakeholders is needed. The goal of these meetings is to develop an understanding of the business requirements by first analyzing the strategic goals and objectives of the organization.
From there, we can map these goals and objectives to higher level data needs that will contribute to the measurement of not only how their business is doing at various stages of their operational life cycle but also, and more importantly, to improve the way decisions are made with data as there may be a need to shift tactics along the way.
As such, the identification of what data they need access to, how they use data now, where they get reports, how they interact with data, what they use data for, what they do with reports once reviewed and to whom they communicate these reports can be a first line of questioning. We must also keep in mind that leaders want access to a fast and effective reporting system with results. In today’s dynamic working environments, whether needing good insights to improve business processes, assess market competitiveness, find trends in customer demand or whatever the goal may be for an organization, the need for business to access an analytical environment on demand and supported by accurate data is the new business modus operandi.
One of the reasons for this on-demand nature is that organization’s want to increase production in areas that they are operationally responsible and accountable. According to the website maximizer.com, the number one reason to employ business intelligence is just for that reason: to boost productivity. According to the site, they assert that “through traditional data gathering methods, users need to compile and analyze data and write related reports. This can be incredibly time consuming, especially for small businesses that may not have the employees to do it. With a BI program, [staff] can pull data and create the reports at the click of a button thus freeing up time and resources allowing employees to be more productive on their own tasks.”
Once we obtain, document and assess the strategic drivers, motivational factors and high level operational demands behind data needs, the next step is to begin a deeper analysis using these inputs, arming our analysis to next learn more about processes.
Analysis of Inputs
A key aspect of the business requirements elicitation and analysis life cycle involves looking at management processes (those that support the strategic arm of the business), operational processes such as Sales or Marketing and supporting processes such as Information Management/Information Technology (IM/IT), Administration and Finance. A process review will often yield further sources and needs of data that will further add to the inputs into the design of the BI solution.
With strategy typically set by an upper management leadership team, processes for managing how effectively and efficiently an organization is moving along the path to meeting its goals and objectives generates a need for data by those at the sub-management level (the managers and directors of operational units); data that is often fed to the senior management team. Armed with having the data needs at a higher level from senior management, an analyst can leverage this input to ask deeper questions of the leadership team closer to the staff members who carry out key operations. In turn, this input is added to validating and discovering data most often handled by operational units.
While the depth and breadth of the types of data requirements needed at the operational level is diverse, suffice it to write that staff on the lines work with data day in and day out. Finance inputs expenses and revenues, pharmacist’s assistants dispense medicines at a cost, marketers query data from open data sources, sales staff execute the marketing plans and acquire new customers or makes repeat sales whose metrics get entered into a system. Logistics staff, the drivers of trucks and the handlers of packages all process data at some point during the day. The data inserts, updates, and deletes are added to transactional systems every second, minute, hour and day, making the need to extract that data in its current form the responsibility of the support providers.
The current form of support to users is provided by IT usually on a case-by-case, or on-demand basis. The inefficiency with this is staleness of the data from the time the report is provided to a user to the time the user makes a request for updated data. The process of making a connection to a transactional system, loading data into a reporting medium (like a spreadsheet) and giving it to users who are on their own to work with the data contributes to an ongoing process of inefficiency and ineffectiveness.
Typical reporting environments require tens, if not hundreds of queries developed by an IT team to satisfy the diverse reporting needs of specific lines of business. It involves an intricate knowledge of table joins and knowledge of how to build advanced and complex SQL queries, which take time and effort to render these assets as usable outputs. Using generic queries often doesn’t satisfy the need to answer key questions related to strategic goals and objectives, or provide data in such a way as to answer those deep, insightful questions executives, managers and operational support staff ask. Transactional systems are not designed for fast querying and analysis. They are designed to process inserts, updates, and delete transactions on records.
While an analysis of current processes and procedures can lend a huge amount of information into what a new BI environment should provide to users in the future, reports based on transactional data merits a need for a solution that is more than just a neat representation of static data with the occasional chart popped in for visual effect; it is also about new ways of getting to data, processing it and taking action with little IT intervention after an initial development and deployment of the solution.
With a grip on strategic plans and internal processes, we can start to identify all sources of data. Some of these sources are common to everyday uses such as database management systems, text-based sources, metadata on files stored in a document repository. One of my instructors on data warehousing even mentioned that microfiche is a candidate for a data source. The sphere of data is not limited to the records in tables and other tuple-based containers but also metadata, the data that describes data. Alternatively, the term “metadata” can best be described as the information that sheds light on the data inside a data source. An example of this for a database (and it depends on the database) includes number of rows, create date, who created the database, its size, location, number of users, status and last backup to name a few of the metadata attributes potentially available to you as the analyst.
While we are investigating the sources of data and beginning the process of understanding the data, we can often ask several questions of our stakeholders that we identified earlier. What does the current data look like? Are there formatting and integrity issues? Is there duplication of data across sources? What data is popular and why? What data is least used or requested and why? How do users think about data? Have they thought of using data in a specific way? How does their current job drive what data they need?
Answers to these questions can yields insights into whether data is missing from their current needs which should be developed to meet future demands. Lines of questioning can be facilitated through interviews to ask stakeholders about the kinds of questions they ask themselves when they look at a specific data domain, like Sales. In addition, users often inform the analysis through the revelation of the level of granularity they need in their data reports such as whether they need reports down to the hour, minute or second. Sometimes, we want to dig into each stakeholder answer about their operations from the data they are reviewing to explore the subject of how decisions made in the past may have been affected not just by access to good data but also from the existence and use of bad data.
At this stage, some aspects of the data extraction, transformation and loading (ETL) process will come out during a review of data sources by asking some of the questions above. Answers may uncover issues with data formats, data types, standardized date and time usage, duplication of data, and how data will need to be transformed from online transactional processing (OLTP) source systems to online analytical processing (OLAP) systems can be documented. These insights will help develop the logical data model that will help with the development of the information cube(s) that will eventually represent the physical implementation of the OLAP system comprised of fact data and dimensional data, discussed shortly.
Another key area to discuss with stakeholders is how data is to be aggregated. While aggregation is achievable in static reports, the time and effort to manually create these aggregations through queries may warrant the exploration to a shift to an analytical system that provides these functions automatically. We cannot assume that users will be given an interface to their data using advanced spreadsheets that come with these functions. While spreadsheets are popular, they might not always to be an available option of reporting interfaces. And keep in mind that even if aggregation is provided, the static data reports may be stale after a certain period of time as they often come from transactional systems. OLAP environments live in powerful systems and are designed to do just one thing: provide analytical foundations for fast, deep and broad data processing in real time or close to real time as possible, presentation and visualization support based on a foundation of well defined data structures, calculations and aggregations. In OLAP environments, user connect to the analysis system to get the latest updates on demand. This pull concept is more efficient than the traditional push concept where IT dumps data and provides it to users.
As our repository of data sources, needs and supporting requirements grows, we can next focus on how to conceptualize the data in an analytical environment to provide the logical foundation for building a physical BI solution, which introduces the topic of multidimensional cubes.
Cubes, facts and dimensions
The review of cubes, fact table and dimension tables will evolve around their business meaning.
To help conceptualize our information cube(s), we can leverage a couple of tools. The concept of entity relationship modelling (ERM) is important. ERM can help us answer questions that will lead to the formation of the fact data and their relationship to dimensional data inside the cube framework. The purpose of this type of modelling method is to help us with logical concepts and Ralph Kimball, a proponent of data marts noted that dimensional modelling “is a logical design technique for structuring data so that it’s intuitive to business users and delivers fast query performance.”
Business Intelligence is where data is presented to users in information cubes, or just cubes for short. Cubes represent data in terms understandable by business users.
Another aspect of cube analysis is to identify what aggregations are required. Aggregation is the pre-calculation of data into summarized or group form. Examples of this include overall total sales, sales by region, sales by city, products sold by customer and similar types of numeric aggregations and data expansion points a user might need to either drill deeper into their analysis of the corporate data by expanding data to a more granular level of reporting or see things at a higher level by collapsing the data at points of aggregation.
Another important point is to find out how frequent the cubes are to be updated from data pulled from the data sources. This information will help the developers of the BI solution with resource planning and how frequent to extract data from the data sources that feed the staging area where transformation of the data occurs before the data is loaded into the BI solution.
This point in time is also an opportunity to determine the roles and permissions that the BI solution must support. In most BI projects, users will only need to have read access in order to query an information cube. Our stakeholder list can help identify a data modeller, power user, business analysts, data stewards, developers, database administrators and any executive sponsors or BI project steering committee members which are job categories that can assist in our roles and permissions conceptualization.
Fact tables store the numeric details of our business domains under analysis in the cube. In the BI world we call these numeric data “measures”. Most fact tables are just that, facts. They contain only numeric data. Our job is to ask the right questions of each line of business about what facts they need to have captured in the BI solution. Facts are often validated using our analysis of operational business processes.
To note further, fact tables also have keys that not only uniquely identify a record in the fact table – called a surrogate key – but also point to data elements found in other tables making up the BI solution, called foreign keys. This establishes the physical fact-to-dimension relationships we conceptualize during an ERM exercise.
Dimension tables give context to the fact tables. We must further elicit requirements from the respective business units about the methods in which facts will be sliced, diced and aggregated. Slicing and dicing is often using when talking about functions found in spreadsheets.
Slicing and dicing is mechanism for viewing data in a multidimensional cube in specific ways. These methods are often misunderstood. They also have different meanings. Slicing is a technique where a user needs to analyze measures (the facts) down to a specific level of detail within a specific dimension. An example of this is a user wanting to see how many sales (facts) were made by a regional office (dimension). Dicing is a technique where a user wants to view information in a sliced cube but at level of detail that spans dimensions. An example of dicing is a user wanting to see how many sales were made by a regional office via a specific sales channel that came from a specific country during specific months. We slice the sales data cube (by region) and break the cube down further by dicing it into smaller cubes of detail (channel, country, time).
The example above is a way to position a discussion with stakeholders by which they want to view their data and at what level of details. I often start a conversation with “So you need to report on your data by…” that feeds the dimension discussion and modelling activity.
Also remember that as part of our data dimension analysis, we are not limited to using data and defining data attributes that only exist from within the organization. We should also ask questions to the business regarding appetite for augmenting factual data using open data from other sources such as Google, the United Nations or the Government of Canada’s Open Data portal and other leading organizations that make available data for research and other purposes to the public. The requirements elicitation process might yield a need for obtaining census data, information on demographics, or survey data that is either not available within an organization at the time but is available from other sources outside the business that specialize in providing open data. The data that could be integrated from outside of a BI solution will depend on what the goals and objectives are of the business based on a strategic review.
Knowing about which dimensions to make available for which business processes can also be identified through a simple charting exercise. What this chart consists of is a listing of all of the business areas under consideration, such as Sales, and a list of all of the potential dimensional data that could apply to this business area (time, date, country, etc.). Below is an example of an output from such as exercise. The processes will yield areas where facts can be gathered. The dimensions that could apply to these processes will yield what context data we would need to support the facts.
Once we have a good understanding of the data that will need to be loaded into our BI solution, we have some fun with stakeholders about how the data is to be transformed before it is served up to the end users. Transformation is the technical process applied to data where data is cleaned up and formatted to a consistent standard before it gets put into the BI environment. Often the data extracted from transactional systems is populated into a staging area where transformation scripts work on the data which is then ready for loading into the OLAP system. What those transformation standards are becomes our responsibility during analysis. We need to identify any corporate policies or procedures that can help us with our transformation requirements elicitation, and if there aren’t any, we need to elicit, document, validate and get approval on those transformation requirements. Transformation requirements are not to be confused with transitional requirements, explained later. Transformation applies to data; transition applies to the shift from one cultural data aspect to another, from transactional thinking to analytical thinking.
As previously noted, transactional systems often have bad data in them. It is our job to assist with identifying this bad data and making it better, cleaner and structured to a consistent format that will make the end users value the BI rollout. Bad data, or lack of sufficient data altogether that could easily be captured and processed, can lead to bad decisions and could result, for example in losing market share or shifting to a different strategic tactic whose need may have been premature and consumed valuable resources unnecessarily. Good data may have informed the decision otherwise. Bad data extracted from a transactional system and loaded into an analytical system in a format that is equally bad is a pointless exercise and a waste of valuable time and effort for all resources involved in a BI project.
Our work up to now will consist of many requirements, often at a volume where scope management is required in order to deliver a BI solution. Scope typically starts out with a large array of needs and wants and without some project governance or controls, becomes unwieldy. I have found that starting out working with a small subset of the scope, such as one business process like Sales, works best. Future data needs can be factored in later using a change management scheme. The key here is to identify the data requirements that provide the highest beneficial returns to the organization. Start with these requirements to initiate the process where you dive deeper into ensuring they are clear and concise before working on the other requirements. These can be used to make up the business requirements value-feasibility matrix.
The value-feasibility matrix
The value-feasibility matrix can help prioritize requirements. A prioritized list of requirements is subject to an agreement among the BI project team, the operational lines of business, the senior management team and the project’s executive sponsor.
The matrix is easy to setup. The hard part is getting agreement on what priority level to assign a BI requirement. To get there involves objectivity, diplomacy and good judgement from all users having a stake in the project.
The process goes like this: for each requirement written, ask each business unit their perceived value of that requirement to their operations. Not all requirements will be High priority nor can they be. In the real world, all requirements having a high priority makes it difficult to execute according to schedule. Through negotiation, discussion and rationalization, requirements can be put on a list and subjected to a litany of questions about their need to implement at certain times.
What helps with that implementation schedule is about asking the IT staff who will be responsible for building the BI solution how feasible it would be to implement that requirement. Document the findings. Not every requirement can be implemented with ease. Once all of the requirements have been reviewed and a feasibility level has been applied to each of those requirements, you can create and populate the matrix that identifies the requirements that fall somewhere within the value-to-feasibility spectrum. We obviously want to implement those BI requirements that are of high value to the business and very feasible to implement by the IT solution providers. These, if all parties agree, could be the scope of the initial BI project. Those requirements that the business considers of lower value but IT believes is not feasible to implement, meaning it is difficult to implement and requires extra resources to get it done, might be put aside, ignored or dropped. Requirements that fall in between can be either put aside for implementation once the initial requirements are met or can be documented in a log for further discussion because of the uncertainty as to their technical feasibility or business importance.
The Transitional Factor
Managing a shift from an environment accustomed to working with static, transaction based reports to an analytical one is a challenging task. Moving from an OLTP-based mindset to and OLAP-based one is achievable with patience, time, commitment, resources and buy-in from the executive sponsor. To assist with identifying and gathering requirements from a transitional perspective, an organizational readiness assessment can gauge an organization’s preparedness and appetite to change from a static, generic reporting culture to one that supports a dynamic business intelligence framework based on analytical foundations.
Readiness assessments can often yield, in addition to the details about an organization’s analytical culture, transitional requirements that will help an analyst understand what training, learning and further knowledge gains are required to get an organization from its current state to its future state.
Not all organizations are analytical in nature in terms of data and nor do they all need to be. In other words, not everyone within an organization immediately thinks strategically and analytically and they all don’t necessarily need to be. It can be a significant cultural shift to get an organization from thinking in terms of the traditional reports they get to “analyze” to being analytical by nature or just thinking analytically to help you draw out requirements for a BI project. The process of eliciting, documenting, validating and modelling transformational and transitional requirements can assist immensely with what is needed to have a successful shift.
The size of the BI project will depend on the scope and complexity of the work involved. Part of an enterprise analysis will consist of working with stakeholders to gauge whether a BI solution is to cover the whole organization or a subset of that organization. This is critical because the answer to that question will drive whether we analyze a larger or smaller data source set, and will determine whether we target a broad or narrow subject area and whether our task will be complex or simple.
As analysts, we need to ask questions at the enterprise level to associate and strengthen the connection between data and how it complements the need to measure progress towards meeting strategic enterprise initiatives.
We also need to understand that we must try to help an organization rethink traditional methods of reporting by laying out the benefits of a business intelligence concept; benefits that are in tune with the nature of the business environment under analysis. We need to continue to act as ambassadors of change, the drivers of a retooling of how data is to be perceived, used and understood by each stakeholder in an organization.
Furthermore, we need to assess whether data under analysis is meant for all stakeholders or is a subset of the organization. Determining the breadth of the data requirements will assist with scope management and level of effort required to effect and manage change to a shift to an analytical culture, or if the culture is already mature, get them to the next level of analytics.
Ideally we should augment our requirements repository with the support of tools like entity relationship models, data models and dimension models. Stress points, bottlenecks and gaps in data needs can be revealed through the use of an as-is, to-be assessment. An organization will always want to move from point A to point B. It is our business to assess the width of the gaps in terms of resources and solutions required to fill them.
Next, work with less. Use the priority matrix as a guide to understanding which data requirements should be implemented first. Don’t make life complicated by accepting a list of requirement as all that are required for implementation. Negotiations and further insights will govern which requirements get implemented first. There is always room for consensus and compromise in any project.
Above all, have fun with the work. It can be challenging but rewarding, especially with the knowledge gains that come from working on a BI project and the notion that we provide key support to a critical process that contributes to the achievement of an organization’s overall corporate objectives using good analytical data as the foundation for effective decision making.