Oct 14, 2013

Centralised Logging for BI Projects based on Android Logcat

I have recently been playing around with creating a centralised logging mechanism to be used across all components of a BI solution. On some projects you can have many different modular and separately developed components that make up the solution as a whole and each one of these components require logging. This brings with it the headache of designing Error and Audit tables for each one of the components and also puts you in a position where these Audit and Error tables should be identical if you want to easily do reporting off the auditing and error data for the solution as a whole. If you do not have a centralised logging mechanism then this will cause a lot of headache as you will end up with tables of different structures to access when the time comes to do the reporting.

I have ended up taking the idea of Android's Logcat logging system and adapting it for use in a BI or Data Warehouse solution. Android has one central logging mechanism called the “Logcat” and users log messages to it by executing a method on a logging object for the type of message they want to log. For example if a user wants to log an error they call the Log.e() method passing to it the standard arguments that methods of the logging object requires, of which one is the message they want to log.

With the above in mind I created a set of database objects in a schema called "Log" which can be deployed to any Data Warehouse or Logging database of choice. This schema contains the following objects:

Tables

  • Component - Stores the text values for each component.
  • LogLevel - Stores the log level value that links to the LogStore table.
  • LogStore - Contains the actual message logged as well as foreign keys to data in the other tables.
  • SystemUser - Stores the user name of the system users logging to the LogStore table.
  • Tag - Stores the text value of each tag used when logging a message to the LogStore.

Views
  • LogView - The view that is used to query the LogStore which links all the text from all the tables together with the message logged.
Stored Procedures
  • uspPopulateLogLevel - Populate the LogLevel table with the default log levels after deployment.
  • uspDebug - Log a debug message.
  • uspError - Log an error message.
  • uspInfo - Log an information message.
  • uspVerbose - Log a verbose message.
  • uspWarning - Log a warning message.

The different LogLevels that supported by default is listed below:
LogLevel
LogLevelDescription
D
Debug
E
Error
I
Info
V
Verbose
W
Warning

The user of this system does not log anything directly to the table and is also not responsible for logging all the columns’ data. Instead there are 5 stored procedures that all take exactly the same input parameters:


Component – This is a varchar(255) field that is the name of the BI component, for example ETL or Reconciliation. This value is stored in the Component table with a foreign key reference in the LogStore table and is used to tie all the different messages for a particular component together.
Tag – This is a varchar(255) field and is used to define tags for the messages of a component. This value is stored in the Tag table with a foreing key reference in the LogStore table. Using tags is important to ensure that you log messages can be reported on later since you will use this to filter on when doing reporting. Tags can be things like “Audit” or maybe a sub step inside an audit routine would be tagged as “Audit_SubStep”. A tag is a way of giving a text description to further describe logging for a component.
Text – This is a varchar(max) field. The reason for Varchar(max) is to ensure that in the off chance that we need a very long string the solution will cater for it. This value is stored in the Text field of the LogStore table. It should contain the detailed message that you want to log to the LogStore. You can also delimit the message so that it can later be parsed when you want to do reporting on the messages. For example you could put a pipe “|” between each line in the text when you construct your message. When it then comes to reporting you will be able to separate each of those line easily and use them in reporting. 
In order to use the logging a user simply needs to call the relevant stored procedure and pass in the values for it, for example if you want to log an error message it would look as follows:
exec log.uspError 'DemoComponent', 'Validation_Tables', 'The table "TestTable" does not exist.'
When this is executed the result from querying the Log.LogView is shown below.

The user of the system does not need to concern themselves with anything other than passing values to the stored proc and the rest of the data that is logged is handled by the stored procedures that form part of the solution.

By using this logging format it makes logging messages easier across the entire BI solution and also ensures that it is done in a consistent manner, allowing reporting to be done from one central place.

The script to create these objects can be downloaded here or can be found at GitHub.

Sep 15, 2013

Hadoop Pseudo-distributed Mode Installation Step by Step

If you have ever picked up a book about Hadoop then I am sure that you spent a good deal of time on setting up your own installation of Hadoop. It is not that the instructions on installing Hadoop is lacking in the book but rather that when you have a background in Microsoft technology all the ins and outs of setting up things on Linux can be quite strange and confusing at times.

Take for instance the installation of Oracle Java on Ubuntu Linux. Ubuntu comes with Open JDK installed and you need to use Oracle Java when you want to run Hadoop.  If you go down the manual installation way it might seem very complicated compared to the installer that you would normally run on you Windows system of choice. Unless you are familiar with all the resources made available by the community for getting things like this done easily you might spend a considerable amount of time googling how to install the prerequisites and only a small amount of time actually installing Hadoop and getting it up and running. 

So in the spirit of saving time and helping make the process a little easier I have decided to create this step by step guide of how to install Hadoop in Pseudo-distributed mode on a single Ubuntu 12.04 machine. The idea is to help you install all the prerequisites easily and get you to a working Hadoop installation as fast as possible. Since this guide is for installing Hadoop in Pseudo-distributed mode it is not aimed at setting up a Hadoop cluster in production. This is simply aimed at getting an instance of Hadoop up and running for you to experiment with.

So lets get started by logging into your Ubuntu machine with the user account that you will use to run Hadoop with. (If you are setting up Hadoop to experiment with it and learn then this can be your own user account.)

Download Hadoop

Navigate to http://hadoop.apache.org/ and click on "Download" link that you will see on the landing page.


Once you have clicked on "Download" you will be taken to the Download section of this page and you will need to click on "Releases".

On the "Releases" page click on "Download".

This should take you to the download section on that page. Note here the different versions of Hadoop that is available and note the "current stable version", this is the version we will be downloading. (At the time of writing this the current stable version was 1.2.1). Click on "Download a release now!".

Choose the mirror site from which you want to download the files.

Select the "stable/" folder on the page displayed.

Inside the "stable" folder download the binary tarball and save it to disk. This is the ".bin.tar.gz" file and in our case the filename is "hadoop-1.2.1-bin.tar.gz". (Do not download the ".deb" file as it will give you warnings with the Ubuntu package manager and will not install properly)

The Hadoop binary tarball should now be downloaded to the "Downloads" folder for the user with which you have logged into the system.

Installing Oracle Java

Installing Oracle Java on your Linux machine can seem like one of the most daunting tasks when you are just starting out in the world of Linux. Anyone that tells me that manually installing Oracle Java on a Ubuntu machine is a simple process needs to get professional help. (Yes I have said it!) 

Fortunately the guys at WEB UPD8 have created a PPA repository that takes the pain out of this process. 

I will use the steps that they provide on their site to install Oracle Java 7. If you are prompted for a password at any time during running these commands provide the password of a Super User on the system, which since this is a dev installation should be your account's password. 

Launch a terminal (shortcut being CRTL+ALT+T) and run the following commands in order.

Install the PPA Repository by typing 
sudo add-apt-repository ppa:webupd8team/java
in your terminal and pressing "enter" when prompted if you want to add the PPA to your system.

Once the PPA has been added and run 
sudo apt-get update
to update the package manager.

Once the package manager has been updated run
sudo apt-get install oracle-java7-installer 
and select "Y" and press enter when asked if you want to continue.


You will be prompted to accept the "Oracle Binary Code License Agreement" so just hit "enter" when you are prompted with the below screen.

On the next screen you will be accepting the license agreement so just select "yes" and hit enter.

 You should now see the Oracle Java installation proceeding with the downloading of the files.

Once the installation is complete you will be returned to the terminal prompt.

We are almost done with the installation of Java, the only thing that remains is to set the environment variables. Run the following command in the terminal
sudo apt-get install oracle-java7-set-default
this will install a package that will set your java environment variables.

Once all of this has been completed you can run the following commands to check that you are now running Oracle Java 7 on your machine.
java -version 

javac -version

Both of these commands should give you similar output as in the screenshots. (At the time of writing this Java 7 Update 40 was the latest version of Java 7).

Installing Hadoop

Our next step is to install Hadoop. Most books and guides normally install Hadoop to the folder "/usr/local/hadoop" and then create a sim link for hadoop in the "/opt" folder. I find this to be a massive pain if you are simply trying to get an instance of Hadoop up and running for you to experiment with. The biggest pain with this has to do with permissions, and all the extra permission settings that you need to have configured to allow those directories to be used.

To make it easier for you to get up and running with Hadoop quickly we will install Hadoop to a folder in the "home" folder of the user that we are logged into the machine with. Keep in mind that this is NOT best practice and should only be used to get a instance up and running for your own personal use. 

First we need to create an "apps" directory in our "home" directory. We can do this either by using the terminal or we could use the GUI. Since we have spent a lot of time using the Terminal so far, and most people always complain that Linux seems to require terminal use for everything, we will opt to go the GUI route to add a bit of variety to our installation. 

On the Ubuntu desktop launch "Nautilus", the file manager equivalent of "Windows Explorer".

This should take you straight to your home folder where you should "Right click" and select "Create New Folder".

Call the new folder "apps" (lower case, remember that all folders and paths in Linux are case sensitive). Once you are done you should have something that looks like the screenshot below.

Next we need to extract our Hadoop tarball into the "apps" directory. Double click on "Downloads" to go into the "Downloads" directory. 

In Downloads double click on the "hadoop-1.2.1-bin.tar.gz" file to open the archive manager.


In the Archive Manager click on Extract and then Navigate to the "home/apps" directory and extract the content of the tarball there.

Once the files have been extracted click on "Quit" when prompted by the Archive Manager.

We now have Hadoop extracted to our machine but before we configure it we need to ensure that we setup SSH to allow access that does not require a password. We will get to this next.


Configuring SSH

Hadoop uses SSH for its processes to communicate with each other on one or more machines and we need to ensure that the user that we are using to install Hadoop with can access all these machines through SSH without a password. We do this by creating a SSH Key Pair which has an empty phrase.

Before we create the SSH Key Pair we need to ensure that the SSH server is installed on our machine. Open the terminal (CTRL+ALT+T or click on the Terminal Icon). In the terminal type the following command.
sudo apt-get install openssh-server
Provide your password when prompted and select "Y" if prompted to install the openssh server to your machine. If OpenSSH Server is already installed on your machine you will be notified of it and you will not need to do anything extra.

Now we can create the SSH Key Pair. In the terminal type the following command.
ssh-keygen
When prompted which folder to store the file in just keep the default by hitting "Enter".

When prompted for the passphrase leave it empty by just hitting "Enter" twice. (the second enter is to confirm the empty passphrase)

You should now be returned to the terminal prompt with the file created and the key's random image displayed.

We are almost done with setting up things for SSH. All that is left to do is to copy the new  public key to the list of authorised keys. We do this by copying the ".ssh/id_rsa.pub" file created in the previous step to the ".ssh/authorized_keys" file. (UK English users be aware NOT to spell authorised with an "S", it must be with a "Z" as in the US English)

Type the following command into your terminal.
cp .ssh/id_rsa.pub .ssh/authorized_keys
Once this is done we can test that all is working by SSH-ing to our local machine. We should not be prompted for a password and should simply be connected via SSH. If you are prompted for a password ensure that you have copied the public key to the "authorized_keys" spelled with a "Z" in the previous step. Run the following command in your terminal.
ssh localhost
When prompted if you are sure you want to continue connecting due to the authenticity of host "localhost" not being established type "yes" and enter.

You should then be logged into the system without any prompts for a username and password.

Close the terminal.

Configuring Hadoop

We are now ready to configure our Hadoop installation to run in pseudo-distributed mode. This is done by changing settings or adding settings in the XML configuration files of Hadoop.

First we need to create the directory in which Hadoop will store its log files. We will use "var/lib/hadoop" directory for these purposes. Open a terminal and type the following command into it. (provide your password when prompted.)
sudo mkdir /var/lib/hadoop
This will create the directory. Next we need to make this folder writeable  by any user. Type the following command into the terminal.
sudo chmod 777 /var/lib/hadoop

Next we can now modify the "core-site-xml" file in the "home/apps/hadoop-1.2.1/conf" directory. This can be done by opening up Nautilus, which we used when we extracted the Hadoop tarball, and navigating to the folder and right clicking on the file and selecting "Open With Text Editor".

In the file we need to add the following between the tags.

fs.default.namehdfs://localhost:9000


hadoop.tmp.dir/var/lib/hadoopOnce this has been added the file should look like the screenshot below.

Save and close the file.

Next we need to modify the "hdfs-site.xml" file. Add the following lines into it between the tags.

dfs.replication1
Save and close the file.

Our last modification is on the "mapred-site.xml" file. Add the following lines into it between the tags.

mapred.job.trackerlocalhost:9001
Save and close the file.

Finally we need to set the JAVA_HOME property in the "hadoop-env.sh" file. Open the file and uncomment the line reading "#export JAVA_HOME=/usr/lib/j2sdk1.5-sun" by removing the "#" in front of it.

Change this to be "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" to point to the location of our java 7 installation.

Save and close the file.

We are now ready to start using Hadoop. we will make extensive use of executables that exist in our "/apps/hadoop-1.2.1/" directory and in order to save us some typing in the command line it is best to add the path to these executables to the environment variables of the shell. We can do this by either typing the commands to add the environment variables each time we open the terminal or we can create a ".sh" shell script that we can run from the terminal before using Hadoop. I suggest creating the shell script as it makes life much easier.

In Nautilus right click in your "home" directory and create an empty document.

Call the file something like "hadoop_set_env.sh". Make sure you give it the ".sh" extension. Double click the file and add the following text to it.
export HADOOP_PREFIX=/home/hadoop/apps/hadoop-1.2.1/export PATH=$HADOOP_PREFIX/bin:$PATH
Replace the "/hadoop/" section in "/home/hadoop/apps/hadoop-1.2.1/" with the name of the user that you are using.

Save and close the file.

Now we can run the file every time that we go to the terminal wanting to work with Hadoop and we will not need to type out the full path for our executables.

Next we need to format our Hadoop "namenode". Launch the terminal and type the following commands.
source hadoop_set_env.shhadoop namenode -format
 You should see output similar to the below screenshot when the namenode format has completed.

Now we are ready to start hadoop. Type the following command in the terminal to start the components necessary for HDFS.
start-dfs.sh
Your output should look similar to the screenshot below.

Next we need to start the components necessary for map reduce. Type the following into the terminal.
start-mapred.sh
Your output should look similar to the screenshot below.

Finally we need to check that all the java processes we have started are running. Type the following into the terminal.
jps
 Your output should look similar to the screenshot below.

Hadoop is now up and running on your machine.

Conclusion

I hope this guide will be of help to anyone that wants to setup a instance of Hadoop for them to experiment with, especially for people more familiar with a Windows environment.

May 21, 2013

Unit testing in Business Intelligence

In my previous post I wrote about the importance of unit testing in business intelligence projects, specifically the need for it when you are following an agile approach. We also looked at the basic layers that you have in the typical Microsoft BI project and how a seemingly easy change has and effect at almost all of the different layers in the solution.

In this post I want to dive into the mechanics of unit testing in BI and look at some high level examples of unit tests.

Basic structure of a unit test

The basic structure of a unit test is very similar across many unit testing frameworks and usually consists of a Setup, Test and Tear down phase








Lets look at each one of these phases and their function.


Setup
The setup phase is used to perform any actions needed to create the environment required to perform the actual test. This could be anything, from creating instances of specific objects to adding records to database tables or setting up files that the test will need. The setup phase is there to ensure that the test has all the resources it requires to run.

Test
The test phase normally involves executing or running the particular piece of code or application that you want to test. You will normally provide the objects that you created during the setup phase as inputs to the code or application that you are testing and then compare the result or outputs of it against an expectation. If the output matches your expectation the test will be successful.

This pattern of providing inputs, executing your code and then comparing it against an expected result is the pattern followed for the majority of tests. There could be cases where you test that under certain circumstances errors are raised in which case your expected result will not be a result set but the actual error that is raised.

Tear Down
Tear down is the phase in which we clean up any objects that might have been left behind after our test has executed. We can do this to ensure that the next unit test has a clean slate to start from or to perform certain functions at the end of each unit test.

Unit tests in a BI project

In a BI project we need to focus on the different applications that we use and devise tests that satisfy the Setup, Test and Tear Down requirements needed to test each of them. Lets look at some practical examples of how you would go about testing two of the main layers in the Microsoft BI stack, namely the ETL and OLAP layers.

SSIS

In order to test SSIS packages you will need to create test data for each test that you would like to perform on a packages. As an example lets look at a simple package that builds a Product dimension.

The input of the package would be a set of database tables in the source system that contains the product and product category information. Below are some samples of the possible data in our source tables.




Our SSIS package takes this data and builds our Product dimension which is a Slowly Changing Dimension Type 2. The attribute that will cause the type 2 change is the Category Code, thus if our Nike X Trainer suddenly moves to a category other than Men's Shoes we will perform a type 2 change on the dimension. All other attribute change will result in a type 1 update to the table.

Two possible tests that we might want to do on this package could be to test that new products are inserted correctly and SCD Type 2 changes work correctly. Lets look at how a potential unit test for this could look like.

Setup
In order to do our test we need to ensure we have data available for our SSIS package to test. In our setup phase we will insert the required rows into the relevant database tables that our SSIS package will point to. (Note that a copy of the source system would be created in a test environment in order to perform the tests.)

The first table that we will populate will be the Product dimension table in the data warehouse. This is done to ensure that we have a dimension table against which we can test our new record and SCD type 2 change logic. We can insert the following data into it.






The second table that we will populate will be the Category table in the source system in our test environment. We can insert the following data into it.










The last table that we will insert data into for the purpose of this test will be the Product table in our source system in the test environment. The data could look as follows.








Test
Once we have completed the setup phase and all our test data is inserted into the relevant tables we will execute the SSIS package and wait for it to complete. We will then compare the data in the Product dimension table against the expected data and if the results match we know that our SSIS packages is functioning correctly based on the input data we provided to it.

Our expected data would could look like this.







If the data in the Product dimension does not match our expectation then our test has failed and we need to fix the SSIS package until it passes our test.

Tear Down
After we have completed the test phase we would typically delete all the records from the tables we used in order to ensure that the next test has a clean database to start with.

SSAS

For SSAS we will follow the exact same pattern as we did for the SSIS, the main difference will be that instead of inserting records into the tables of our source system we will insert records into our data warehouse.

The steps we would take to create a test for a particular measure in our cube would look something like this.

Setup
Insert records into the data warehouse for each dimension and fact table required to process the cube. (Remember you do not need to insert hundreds of records, only enough records to allow you to run the test that you have defined.)

Test
Process the cube, ensuring that if the cube does not process successfully it will automatically result in failing the test.

Once the cube has been processed run a prepared MDX query against the cube and compare the result of it against the expected result of your test. If the two results match then our measure has passed the test. If there is a mismatch then the measure has failed the test.

Tear Down
After the test phase is complete we will clean out the data warehouse tables that we populated at the start of the test to ensure any subsequent tests have a clean database to start with.

More extensive tests and available tools

From the above I am sure you can see how,in addition to the unit tests that we would normally create for each individual package, we could create tests that test our ETL end to end. We could extend this further to be an end to end test from our source system all the way to our cube.

So far we have looked at the structure of a unit test and some examples of how tests against SSIS and SSAS. We have not yet investigated how this can be automated so that we could integrate these tests into our nightly build, or as developers do actual test driven development on our BI projects.

In my previous post I have mentioned some of the tools that we could use such as SSISUnit and QueryUnit. Other options are to develop your own unit testing tool. Unfortunately we do not currently have a dedicated testing tool for BI projects in the Microsoft BI tool set. By that I mean a tool that allows for the easy, user friendly development and running of unit tests. (Just to be clear, writing loads of C# to create a unit test is not something I would classify as user friendly since it brings an added level of complexity to the process).

In the next post we will explore some of the solutions and steps that we can take to get closer to the ideal of a testing tool specifically aimed at BI projects.

May 19, 2013

Agile BI, without unit testing you are using the "Max Power Way"

Agile BI has been all the rage for a while now. It is one of those buzz words that everyone loves to throw around. I personally despise buzz words, mainly because they get latched on to by people who do not always have a clear understanding of the fundamentals they encapsulate.

These buzz words then get chucked around in meetings like a magic spell, one that will get a project delivered faster, almost like the "turbo" button you had on your 486 computer back in the day.

In the custom development world agile methodology has been around for a long time and unit testing forms the core of any custom dev project that is truly following an agile approach. It is therefore no wonder that you have a plethora of unit testing frameworks and tools available for almost every conceivable language and framework out there.

The sad fact is that when it comes to BI we have caught up with the buzz word of Agile but we are sorely under equipped to actually take advantage of the methodology due to our lack of unit testing tools.

This twilight zone in which we find ourselves lead to many projects heading down the "Max Power Way" in which we accelerate the development of the system but introduce more and more bugs and issues in the project due to the knock on effect that a change has as the project grows.



Let me give an example of what I mean. Below is the "BI Burger" slide from Microsoft showing the different layers in a Microsoft BI solution.













Now imagine that you are delivering a BI solution in which you are following agile methodology. You have completed the ETL, Data Warehouse, Cube and Reporting layers of the first sprint and you are about to kick off with sprint two.

The first task of the second sprint is to rename a measure in a fact table and change a couple of data types, one of which is to increase the size of a varchar attribute in a dimension table and the other is changing a measure from decimal to float in a fact table. All of these changes need to happen in the data warehouse.

Because everyone is hyped up on double strength doses of Agile the task is estimated to not take very long. However the impact of these changes on the solution is much bigger than the developers realise. While you can make the changes in a few minutes you have to keep in mind that you would need to change all the parts of the system that interact with the objects that you have changes. Lets take a look at these in more detail.

Potential changes in BI layers

SSIS

In the SSIS solution you will need to ensure that all the packages that interact with the changed objects are updated to reflect the changes. Some of the changes involved are listed below: 
  • For the data type changes and column name change you will need to update the metadata of all the SSIS packages that interact with the tables in which this change took place. 
  • Run some tests to ensure that the changes do no cause truncation of data for the dimension table column that was updated.
  • Test that any calculations performed on the measure in the fact table produce the correct results.

Data Warehouse

The changes for the first task of sprint two were all changes in the data warehouse, however these are not the only changes you might have to make in your warehouse. In addition to the original changes you might have to do the following:
  • Update any views that reference the renamed measure. ( If you are following best practice then this would mean the changes to the views in the data warehouse on top of which you build your cube)
  • Update any stored procedures that reference the renamed measure. (Remember there can be quite a large amount of SSRS reports that could use stored procedures against the data warehouse.)
  • Test all the calculations of any function, procedure or script that use the measure that had the data type change.

SSAS

In Analysis Services you will need to ensure that the data source view is updated to reflect the changes made in the data warehouse. Some changes you will need to make are listed below:
  • You would need to update the actual attribute in the dimension which references the column with the increased varchar size. 
  • You will also need to update the measures to ensure that the name change is corrected in the measure group and that the measure now points to the correct column in the fact table or view. 
  • You will also need to update the data type of the measure which had it's data type changes in the data warehouse.
  • You will need to update all calculated measures that reference the measure that had the name change.
  • You will need to test the result of all calculated measures that use either of the two measures to which a change was made. This is to ensure that they still work correctly and that they produce the correct results.

Reporting Layer

I will not go into each application that make up the reporting layer in a Microsoft BI solution but instead I will list some of the changes that you would need to make throughout the reporting layer due to the changes that were implemented.:
  • Update all PerformancePoint Server dashboards that reference the measure that had the name change.
  • Update all SSRS reports that references the measure that had the name change.
  • Update all PowerView reports that are referencing the measure that had the name change.
  • Update all Excel Services reports that are referencing the measure that had the name change.

Unit Testing

As you can see from the above list of potential changes, and I am sure there are some that I might have left out, a change that could be seen as trivial by a business user has got a massive ripple effect throughout your BI solution.

The hardest part of making these changes are ensuring that you have not broken any other parts of your system, and that your figures are still coming out correct in the reporting layer for end user consumption. There are two ways to ensure this:

  • Test every single area of the system manually to ensure everything functions as expected and that all values across the system are still being calculated correctly after the changes have been made.
  • Make use of automated unit testing and kick off the entire batch of unit tests that were created during the development of the system to do the hard work for you.
Personally I prefer option number two as it is the best approach to ensure that the entire system is functioning as expected while removing the potential human errors that would be involved in doing it manually. This also reduces the time it takes considerably as you can let your unit tests run on your test server while you get on with other tasks. You could also integrate the unit tests into a nightly build process.

Most BI developers tend to not realise the benefits of automated unit testing, mostly because they have traditionally worked using waterfall methodology. On top of this the limited amount of unit testing tools in the BI world adds to the challenge of automated unit testing in BI projects.

For instance in SSIS you could use SSISUnit  and for the data warehouse and cube you could use QueryUnit. I have also seen some custom tools created to do unit testing on BI solutions and these sometimes require a significant amount of work in order to produce the tests themselves. None of these tools and approaches are without their short comings but they surely beat having to do the testing manually every single time you make changes to the system. 

Lets be clear, you HAVE to retest the system the moment you make any changes to any part of it. The fastest way to get yourself into a bad situation is to make changes and think that everything will be fine with limited testing. Without having proper unit tests written for every single part of the BI solution, no matter how small, you cannot say with 100% confidence that a change has not broken something somewhere in one of the layers of your solution.

Think about the estimate that you would have to give during a sprint planning meeting for the changes in our example based on the two ways of testing. I am sure you would agree that your estimates would seem much less agile if you have to estimate them based on manual testing. In fact I am willing to bet that you would struggle to give an accurate estimate based on the manual testing method.

Automated unit testing is the only way to ensure that you realise the benefits of Agile BI.

Conclusion

I have briefly shown the impact that a seemingly easy change to the system has on a BI solution and talked about the importance of automated unit testing to ensure that you have 100% confidence in your solution while following agile principles.

In the next post about I will dig a little deeper into the tools available to us for automating our unit testing.

Until then always remember that proper unit testing is the only thing separating your Agile BI project from becoming a "Max Power" BI project.