May 21, 2013

Unit testing in Business Intelligence

In my previous post I wrote about the importance of unit testing in business intelligence projects, specifically the need for it when you are following an agile approach. We also looked at the basic layers that you have in the typical Microsoft BI project and how a seemingly easy change has and effect at almost all of the different layers in the solution.

In this post I want to dive into the mechanics of unit testing in BI and look at some high level examples of unit tests.

Basic structure of a unit test

The basic structure of a unit test is very similar across many unit testing frameworks and usually consists of a Setup, Test and Tear down phase








Lets look at each one of these phases and their function.


Setup
The setup phase is used to perform any actions needed to create the environment required to perform the actual test. This could be anything, from creating instances of specific objects to adding records to database tables or setting up files that the test will need. The setup phase is there to ensure that the test has all the resources it requires to run.

Test
The test phase normally involves executing or running the particular piece of code or application that you want to test. You will normally provide the objects that you created during the setup phase as inputs to the code or application that you are testing and then compare the result or outputs of it against an expectation. If the output matches your expectation the test will be successful.

This pattern of providing inputs, executing your code and then comparing it against an expected result is the pattern followed for the majority of tests. There could be cases where you test that under certain circumstances errors are raised in which case your expected result will not be a result set but the actual error that is raised.

Tear Down
Tear down is the phase in which we clean up any objects that might have been left behind after our test has executed. We can do this to ensure that the next unit test has a clean slate to start from or to perform certain functions at the end of each unit test.

Unit tests in a BI project

In a BI project we need to focus on the different applications that we use and devise tests that satisfy the Setup, Test and Tear Down requirements needed to test each of them. Lets look at some practical examples of how you would go about testing two of the main layers in the Microsoft BI stack, namely the ETL and OLAP layers.

SSIS

In order to test SSIS packages you will need to create test data for each test that you would like to perform on a packages. As an example lets look at a simple package that builds a Product dimension.

The input of the package would be a set of database tables in the source system that contains the product and product category information. Below are some samples of the possible data in our source tables.




Our SSIS package takes this data and builds our Product dimension which is a Slowly Changing Dimension Type 2. The attribute that will cause the type 2 change is the Category Code, thus if our Nike X Trainer suddenly moves to a category other than Men's Shoes we will perform a type 2 change on the dimension. All other attribute change will result in a type 1 update to the table.

Two possible tests that we might want to do on this package could be to test that new products are inserted correctly and SCD Type 2 changes work correctly. Lets look at how a potential unit test for this could look like.

Setup
In order to do our test we need to ensure we have data available for our SSIS package to test. In our setup phase we will insert the required rows into the relevant database tables that our SSIS package will point to. (Note that a copy of the source system would be created in a test environment in order to perform the tests.)

The first table that we will populate will be the Product dimension table in the data warehouse. This is done to ensure that we have a dimension table against which we can test our new record and SCD type 2 change logic. We can insert the following data into it.






The second table that we will populate will be the Category table in the source system in our test environment. We can insert the following data into it.










The last table that we will insert data into for the purpose of this test will be the Product table in our source system in the test environment. The data could look as follows.








Test
Once we have completed the setup phase and all our test data is inserted into the relevant tables we will execute the SSIS package and wait for it to complete. We will then compare the data in the Product dimension table against the expected data and if the results match we know that our SSIS packages is functioning correctly based on the input data we provided to it.

Our expected data would could look like this.







If the data in the Product dimension does not match our expectation then our test has failed and we need to fix the SSIS package until it passes our test.

Tear Down
After we have completed the test phase we would typically delete all the records from the tables we used in order to ensure that the next test has a clean database to start with.

SSAS

For SSAS we will follow the exact same pattern as we did for the SSIS, the main difference will be that instead of inserting records into the tables of our source system we will insert records into our data warehouse.

The steps we would take to create a test for a particular measure in our cube would look something like this.

Setup
Insert records into the data warehouse for each dimension and fact table required to process the cube. (Remember you do not need to insert hundreds of records, only enough records to allow you to run the test that you have defined.)

Test
Process the cube, ensuring that if the cube does not process successfully it will automatically result in failing the test.

Once the cube has been processed run a prepared MDX query against the cube and compare the result of it against the expected result of your test. If the two results match then our measure has passed the test. If there is a mismatch then the measure has failed the test.

Tear Down
After the test phase is complete we will clean out the data warehouse tables that we populated at the start of the test to ensure any subsequent tests have a clean database to start with.

More extensive tests and available tools

From the above I am sure you can see how,in addition to the unit tests that we would normally create for each individual package, we could create tests that test our ETL end to end. We could extend this further to be an end to end test from our source system all the way to our cube.

So far we have looked at the structure of a unit test and some examples of how tests against SSIS and SSAS. We have not yet investigated how this can be automated so that we could integrate these tests into our nightly build, or as developers do actual test driven development on our BI projects.

In my previous post I have mentioned some of the tools that we could use such as SSISUnit and QueryUnit. Other options are to develop your own unit testing tool. Unfortunately we do not currently have a dedicated testing tool for BI projects in the Microsoft BI tool set. By that I mean a tool that allows for the easy, user friendly development and running of unit tests. (Just to be clear, writing loads of C# to create a unit test is not something I would classify as user friendly since it brings an added level of complexity to the process).

In the next post we will explore some of the solutions and steps that we can take to get closer to the ideal of a testing tool specifically aimed at BI projects.

May 19, 2013

Agile BI, without unit testing you are using the "Max Power Way"

Agile BI has been all the rage for a while now. It is one of those buzz words that everyone loves to throw around. I personally despise buzz words, mainly because they get latched on to by people who do not always have a clear understanding of the fundamentals they encapsulate.

These buzz words then get chucked around in meetings like a magic spell, one that will get a project delivered faster, almost like the "turbo" button you had on your 486 computer back in the day.

In the custom development world agile methodology has been around for a long time and unit testing forms the core of any custom dev project that is truly following an agile approach. It is therefore no wonder that you have a plethora of unit testing frameworks and tools available for almost every conceivable language and framework out there.

The sad fact is that when it comes to BI we have caught up with the buzz word of Agile but we are sorely under equipped to actually take advantage of the methodology due to our lack of unit testing tools.

This twilight zone in which we find ourselves lead to many projects heading down the "Max Power Way" in which we accelerate the development of the system but introduce more and more bugs and issues in the project due to the knock on effect that a change has as the project grows.



Let me give an example of what I mean. Below is the "BI Burger" slide from Microsoft showing the different layers in a Microsoft BI solution.













Now imagine that you are delivering a BI solution in which you are following agile methodology. You have completed the ETL, Data Warehouse, Cube and Reporting layers of the first sprint and you are about to kick off with sprint two.

The first task of the second sprint is to rename a measure in a fact table and change a couple of data types, one of which is to increase the size of a varchar attribute in a dimension table and the other is changing a measure from decimal to float in a fact table. All of these changes need to happen in the data warehouse.

Because everyone is hyped up on double strength doses of Agile the task is estimated to not take very long. However the impact of these changes on the solution is much bigger than the developers realise. While you can make the changes in a few minutes you have to keep in mind that you would need to change all the parts of the system that interact with the objects that you have changes. Lets take a look at these in more detail.

Potential changes in BI layers

SSIS

In the SSIS solution you will need to ensure that all the packages that interact with the changed objects are updated to reflect the changes. Some of the changes involved are listed below: 
  • For the data type changes and column name change you will need to update the metadata of all the SSIS packages that interact with the tables in which this change took place. 
  • Run some tests to ensure that the changes do no cause truncation of data for the dimension table column that was updated.
  • Test that any calculations performed on the measure in the fact table produce the correct results.

Data Warehouse

The changes for the first task of sprint two were all changes in the data warehouse, however these are not the only changes you might have to make in your warehouse. In addition to the original changes you might have to do the following:
  • Update any views that reference the renamed measure. ( If you are following best practice then this would mean the changes to the views in the data warehouse on top of which you build your cube)
  • Update any stored procedures that reference the renamed measure. (Remember there can be quite a large amount of SSRS reports that could use stored procedures against the data warehouse.)
  • Test all the calculations of any function, procedure or script that use the measure that had the data type change.

SSAS

In Analysis Services you will need to ensure that the data source view is updated to reflect the changes made in the data warehouse. Some changes you will need to make are listed below:
  • You would need to update the actual attribute in the dimension which references the column with the increased varchar size. 
  • You will also need to update the measures to ensure that the name change is corrected in the measure group and that the measure now points to the correct column in the fact table or view. 
  • You will also need to update the data type of the measure which had it's data type changes in the data warehouse.
  • You will need to update all calculated measures that reference the measure that had the name change.
  • You will need to test the result of all calculated measures that use either of the two measures to which a change was made. This is to ensure that they still work correctly and that they produce the correct results.

Reporting Layer

I will not go into each application that make up the reporting layer in a Microsoft BI solution but instead I will list some of the changes that you would need to make throughout the reporting layer due to the changes that were implemented.:
  • Update all PerformancePoint Server dashboards that reference the measure that had the name change.
  • Update all SSRS reports that references the measure that had the name change.
  • Update all PowerView reports that are referencing the measure that had the name change.
  • Update all Excel Services reports that are referencing the measure that had the name change.

Unit Testing

As you can see from the above list of potential changes, and I am sure there are some that I might have left out, a change that could be seen as trivial by a business user has got a massive ripple effect throughout your BI solution.

The hardest part of making these changes are ensuring that you have not broken any other parts of your system, and that your figures are still coming out correct in the reporting layer for end user consumption. There are two ways to ensure this:

  • Test every single area of the system manually to ensure everything functions as expected and that all values across the system are still being calculated correctly after the changes have been made.
  • Make use of automated unit testing and kick off the entire batch of unit tests that were created during the development of the system to do the hard work for you.
Personally I prefer option number two as it is the best approach to ensure that the entire system is functioning as expected while removing the potential human errors that would be involved in doing it manually. This also reduces the time it takes considerably as you can let your unit tests run on your test server while you get on with other tasks. You could also integrate the unit tests into a nightly build process.

Most BI developers tend to not realise the benefits of automated unit testing, mostly because they have traditionally worked using waterfall methodology. On top of this the limited amount of unit testing tools in the BI world adds to the challenge of automated unit testing in BI projects.

For instance in SSIS you could use SSISUnit  and for the data warehouse and cube you could use QueryUnit. I have also seen some custom tools created to do unit testing on BI solutions and these sometimes require a significant amount of work in order to produce the tests themselves. None of these tools and approaches are without their short comings but they surely beat having to do the testing manually every single time you make changes to the system. 

Lets be clear, you HAVE to retest the system the moment you make any changes to any part of it. The fastest way to get yourself into a bad situation is to make changes and think that everything will be fine with limited testing. Without having proper unit tests written for every single part of the BI solution, no matter how small, you cannot say with 100% confidence that a change has not broken something somewhere in one of the layers of your solution.

Think about the estimate that you would have to give during a sprint planning meeting for the changes in our example based on the two ways of testing. I am sure you would agree that your estimates would seem much less agile if you have to estimate them based on manual testing. In fact I am willing to bet that you would struggle to give an accurate estimate based on the manual testing method.

Automated unit testing is the only way to ensure that you realise the benefits of Agile BI.

Conclusion

I have briefly shown the impact that a seemingly easy change to the system has on a BI solution and talked about the importance of automated unit testing to ensure that you have 100% confidence in your solution while following agile principles.

In the next post about I will dig a little deeper into the tools available to us for automating our unit testing.

Until then always remember that proper unit testing is the only thing separating your Agile BI project from becoming a "Max Power" BI project.


Jan 29, 2013

Reporting Services 2005 Image randomly does not render

This is a bit of a back to the future post. I have recently had to do some work in SSRS 2005 and came across this little issue when using IE9 against SSRS 2005.

If you have report that shows images from and external source inside a cell in a matrix these render correctly in BIDS (I have not tested it with a table but it could be similar). However when you deploy the report to Report Manager you might encounter behavior in which the images are randomly not shown. It does not give a red X showing an error with the image, it just does not render the image.

This can be fixed by placing a rectangle inside the cell in your matrix and then adding the image object into the rectangle. The images are then rendered correctly when the report is viewed from Report Manager in IE9.

Aug 24, 2012

Windows 8 User Interface, have you actually used it or those alternatives you suggest?

I know this post has got absolutely no relation to analytics or business intelligence but I really cannot keep reading all the Windows 8 doom and gloom posts without asking, have you actually used it? When was the last time you used one of the alternative operating systems that you believe people will embrace due to this so called horrible new Modern UI?

Lets get some facts straight, the Modern UI ("Metro") is definitely not the horrible interface that I keep reading about. The last time I checked almost no one was clicking on "Start" -> "All Programs" -> "Their desired application's folder" -> "Application launch icon". This process simply does not happen nearly as often as the hyperactive Windows bashers keep making out in all these doom and gloom blog posts and articles. Instead users put TONS of icons on their desktop or pin them to the task bar so that they can avoid doing exactly this convoluted clicking process to launch an application.

Modern UI simply transforms all of this messy desktop clutter by giving you the option to actually create tiles for all the applications you would have put on your desktop right there on the start screen. Is it really such a bad thing that you can now at least avoid looking like the icon hoarder of note in your office and actually get to your apps faster? And just in case you are really upset about not being allowed to continue with your cluttered "method" of application launching, you can still go and pin all those applications to your desktop or your start bar. Nice huh? You actually still have a choice.

Lets look at another feature that most people tend to omit when writing about this so called horrible UI, the ability to simply start typing the name of the application you want in the Start screen. It immediately displays a list of all the applications installed on your system, filtered down as you type until you are left with the result of your search. You can then either launch this application by clicking on it or you can right click on it and add it to your start screen. Not really such a big deal right? From the mountain of anti Windows 8 blogs you would think that Microsoft has actually decided to make it as difficult as possible to get to your software when actually its the opposite.

Lets now look at some similarities between the most popular Linux distribution, Ubuntu, and Windows 8. I highly doubt that anyone of the anti Windows 8 bloggers have used Ubuntu because if they have they would have realised that since Ubuntu introduced Unity, which caused exactly the same flamed outcry due to UI changes, you also do not have a menu structure that you can click through. Instead you have the Dash, in which you proceed to type in the name of the application you want to launch. If you don't want to do this every time you simply pin the application to the Launcher bar. Shiver me timbers! That sounds amazingly similar to some of the options you have in Windows 8 instead you just have a start screen where you can avoid having to open a search feature to search for your applications.

Believe me if you Googled the introduction of Unity into Ubuntu you will find the exact same flaming rants of how horrible it is, yet after using it for a few months as my main desktop OS at home I was very happy with how the interface worked. If you want a desktop environment with traditional menus then you can install Linux Mint which will give you all the Gnome goodness and enough menus to click on but just don't move your mouse into the top left corner of the screen because you might just active the search feature which allows you to search for applications or launch your favourites from. Yes for some reason they also decided that messy desktops is a relic of the past.

My point with all of this is that by reading all the ranting and mentioning of alternatives it is clear that very few of these people have actually used these so called alternatives. If someone complained that they are limited to rectangles and squares on the start screen when they wanted circles and triangles then that complaint would be more accurate than almost anything I have read so far. Not to mention that almost all of the negative blogs forget to mention the absolute blazing start-up times or the truly phenomenally fast performance of the OS itself. I mean Windows 8 uses 1.2 GB of the 8GB of my two year old laptop. It is actually the first Windows that uses less resources and is faster than its predecessor, truly a monumental feat in the Windows family of operating systems. It is clear that I like Windows 8 and I am not asking you to like it, I am merely asking you to actually write the truth when you start suggesting alternatives or calling out that the sky is falling.

And before I sign off on this semi rant, the issue with SmartScreen is that people are too afraid to click on "Customize" at the end of the installation process. So in case you have been seeing all the hype about this "feature of death" here are two screenshots to guide you through turning it off at the end of the installation.