Last year I entered the PowerBI video demo contest. Whilst I didn’t win any prizes I did learn a fair bit from going through the exercise of putting together a screencast demo (more on that another time). In this post I’m going to walk-through the web scraping part of my demo.
The website that I choose to use for my demo was the National UFO Reporting Center (more for novelty sake then any serious interest).
In the past I have written extensively about how to build custom components for SQL Server Integration Services, these posts have always been focused on the ‘happy path’, if you’re not familiar with this phrase it refers to the path through your application that works exactly as expected. Often times in development we have to deal with the sad path, or when things aren’t working as we would like or expect.
This post is an extension of Tutorial 12 from Hortonworks (original here), which shows how to use Apache Flume to consume entries from a log file and put them into HDFS.
One of the problems that I see with the Hortonworks sandbox tutorials (and don’t get me wrong, I think they are great) is the assumption that you already have data loaded into your cluster, or they demonstrate an unrealistic way of loading data into your cluster - uploading a csv file through your web browser.
A couple of weeks ago I was doing some work on an internal reporting cube. One of the measures required represents an ‘order backlog’, that is orders that have been received but haven’t yet been provisioned in our systems.
The Problem The fact table looks something like this:
A row will appear in the fact table after the order has been closed, with the provisioned date being set to NULL until it has been provisioned.
The Microsoft PowerBI Competition is now in full swing with the voting open to the public for the next week. (Check out my entry).
As you can see below I just made my submission in time.
I like to cut it fine!
When I came to building my demo (check it out) I had a few different data sets in mind, but there were two main points that I wanted to highlight from my entry -
Previously I’ve written about the database unit testing framework tSQLt, you can read about it here, there is also an excellent Pluralsight course by Dave Green (blog | twitter) which you can find here.
In this post I’m going to show you a method of version controlling your database and unit tests with SQL Server Database Projects in SQL Server Data Tools (SSDT).
Setting up the Solution In my solution I’ve created two SQL Server Database Projects