Johan Åhlén

Johan Åhlén's blog about life, universe and everything.
I will speak about StreamInsight at TechDays 2010 in Sweden

StreamInsight is one of the main news of SQL Server 2008 R2. It's a platform for Complex Event Processing (CEP) that enables processing of high speed event streams using familiar tools such as Visual Studio, C# and LINQ. CEP is a technology that is increasingly used for algorithmic trading, targeted ads at websites and real-time Business Intelligence. In this session, you will get a 60 minutes introduction to how StreamInsight works and a demo of how to build an application that handles data in near real-time from sources such as Twitter, Facebook and financial markets. 

Meet me at TechDays during 23-24 of March in Örebro (Sweden)!

Algorithmic Trading using Complex Event Processing (CEP) and Data Mining

Algorithmic Trading is automated trading done by computer programs (algorithms) that react to patterns in financial market data or financial news. As of 2009, high frequency trading firms accounted for 73% of all US equity trading volume.

Several different strategies are used for algorithmic trading. For example index arbitrages exploit differences between pricing of indexes and the stocks they represent. Market making is another strategy that aims at earning money on the spread between bid and ask price. More complicated strategies use data mining and pattern recognition.

Personally I believe there's much truth in the saying that the stock market is mostly psychology. How do you believe the following influences the stock market? 

  • Dividends and issues of shares. Overreactions.
  • Buy- or sell recommendations from analysts.
  • Publicity in newspapers.
  • Weather.
  • Season. Report periods.
  • Political events.

Getting Started

What do you need for your do-it-yourself Algorithmic Trading system?

  • A high quality data source of market data.
  • Reliable sources for other data.
  • An automated financial instrument order handling system.
  • A CEP (Complex Event Processing) engine.
  • A data mining engine.

Financial data feeds are available as standard services from several financial companies. There is even a standard protocol for the information, FIX (Financial Information eXchange). Also large financial news bureaus can supply their news in a format that is adapted for automatic analysis.

Financial brokers exists that can handle your orders automatically at decent commissions (of course depending on your trading volume). You should be able to find them through your internet stock broker, which may supply this service themself or have a partner that does.

Complex Event Processing and Data Mining are supported by SQL Server 2008 R2 through StreamInsight and Analysis Services. You can learn more about StreamInsight in my articles on SQLServerCentral.

Evaluating your algorithms

Some factors to consider:

  • "Edge" (Return On Investment in the long run)
  • Opportunity (how often will there be any action)
  • Accuracy (how often will there be a win)

Personally I'd prefer an algorithm that gives a stable income compared to a high risk algorithm.

Risks

Algorithmic Trading, like any other investment activity, requires you to be cautious. A friend told me about his neighbour that earned millions one year and lost them (and his house) the next. Like in poker you have to decide on a maximum loss level where you are ready to take the pain and leave.

On the other hand, Complex Event Processing is also used by trading firms to reduce their risk. It can help you monitor your risk exposure in real time.

Also, very important... just that you have found a pattern that worked historically doesn't imply that it will work in the future. Have you heard about "The Bible Code" (or Torah Code) which describes encrypted messages in the bible? Only... you could find encrypted messages in any book if you search long enough. Remember, that is the weakness of Data Mining.

My recommendation is to keep to a test environment until you are sure about your algorithms. Also it is not a bad idea to take a course in statistics to learn how to make statistical significance tests.

 

Exploring Codename "Dallas" with PowerPivot

A couple of weeks ago, Microsoft opened up their Codename "Dallas" website. The service is by invitation only, but there is a mail address on their site where you can ask for an invitation code.

What is Codename "Dallas"?

- It is a marketplace for premium information, both current and historical, where you can search and access it in a single location and using a standardized API (REST and ATOM 1.0) and a simple billing model. Great for Business Intelligence applications where you easily can combine it with your own data. The benefit for the consumer is that you can very easily buy access to the information you want. For suppliers, it provides an easy way to market and sell your information services.

Let me show an example how you can get the information you want into PowerPivot.

First there is the catalog, where you can browse and subscribe to the dataset you are interested in.

Then you can manage your subscriptions and click on a link to explore the contents of them. Also here is a link to vote for contents you would wish to have (a funny coincidence that one of the top content requests now is for news feeds - and that is exactly what I have been developing on the Windows Azure platform lately).

Below is a screenshot of the content explorer. I've picked the United Nations WHO (World Health Organization) Database as an example. You specify which variable you are interested in, for example Life Expectancy. Then you select year and can also filter on nations. A preview is available.

If you are interested in using this data in PowerPivot, you can click Analyze. For me it works only if I first start PowerPivot and then click the Analyze button. Then a dialog similar to the one below should be shown. Click Open.

PowerPivot will then display the Table Import Wizard. Give a good name to your connection and click Next.

During the next step, you can limit the contents of the data feed. Or just click Finish to include all data.

Finally you should see a confirmation dialog that the rows have been loaded into memory. However I never managed to import more than 100 rows at a time from the WHO Database. Probably that is some limitation of the trial version or a limitation on their side. To get more rows, I needed to repeat the process and add new data feeds.

Finally, you can play around with the data as usual in PowerPivot.

Another option than using PowerPivot is the consume data from your C# applications. For that purpose, codename "Dallas" can autogenerate C# classes that fetch the data for you.

There's a link on the content explorer to create the C# proxy classes. Click there and then choose to save the file.

Below is an example autogenerated C# class for reading the WHO data.

The number of suppliers is currently very limited. For the business minded it should be a good opportunity to sell your high quality information services.

Give it a try on http://www.microsoft.com/windowsazure/dallas/!

 

Innovation - why companies need skunks

Do you have any skunks in your organisation? You should be happy if you have! Former CEO of Ericsson, Sven-Christer Nilsson, writes in his book that all great successes within Ericsson (like IP-telephony and mobile phones) have originated from persons that were not accepted by the management. This is not unique for Ericsson. There are a multitude of examples within other large companies. Skunk work, the things that employees do while the management looks away, can obviously be the most important work at the companies.

It seems to be a current trend that companies claim they are innovative, but few truly are. Google is one of few companies that seem to have understood innovation. At Google employees have a day a week for free work (skunk work!). Another thing is their rewards, so that nobody should have to leave Google because they have a great idea. A waste of time and money? No, their best ideas and products have actually evolved from those free work days. It's a matter of freedom combined with responsibility, and they keep the people that have a passion for their ideas.

I can really also recommend the book "The Future of Management" by Gary Hamel, where the author does a good job of explaining innovation (especially management innovation) and why innovation is not just a job for R&D departments. It should be mandatory reading for all management consultants working with Corporate Performance Management / Business Intelligence. One of the main points of the book is that most companies focus on operational efficiency at the expense of strategic efficiency, missing the most valuable business opportunities.

"This doesn't apply to my company" or "my company is a consulting only company, so there is nothing to improve" are comments I have heard from people stuck in the old thinking. I can give a good example of a consulting company organised in a new innovative way, but that will be in another blog post

Einstein said once that "We can't solve problems by using the same kind of thinking we used when we created them." I think that is why Apple encourages people to call someone they haven't talked to for 6 months. Why don't you do the same?

More reading

 

SQL Azure - some tips & tricks

In this blog post I'd like to share some of my experience working with SQL Azure so far. First a background on what SQL Azure really is and then I will give you some of my tips and tricks.

What is SQL Azure

Microsoft SQL Azure Database is a cloud-based relational database service that works much like an ordinary stand-alone instance of Microsoft SQL Server. It support most of the functionality in the SQL Server 2008 relational database engine, including SQLCLR support. It doesn't have the Business Intelligence components like Integration Services, Analysis Services, Reporting Services. Also it doesn't support typical DBA features, but that's logical since the whole idea of SQL Azure is that you should not need any DBA. Microsoft promises a 99.9% availability during a calendar month.

Currently SQL Azure is free to evaluate if you have an invitation. In production, the pricing will be a fixed cost per month which depends on edition and a data transfer cost. The web edition is limited to 1 GB database size and currently costs $9.99 / month. The business edition allows up to 10 GB database size and costs $99.99 / month. SQLCLR is supported only by the business edition. Data transfer costs is dependent on location, but costs less than a dollar per gigabyte.

Below is a screenshot from the SQL Azure management web site. There you create your databases and configure your firewall settings. This web site is located at http://sql.azure.com/.

 

Connecting to SQL Azure

Once you have created your SQL Azure databases, you need to enable access to the in the firewall settings. After doing that and waiting a couple of minutes for the changes to take effect, you can connect to your database. You can click on the Connection Strings button to get the necessary information to connect.

Previously there was no support for any graphical user interface to SQL Azure. When I started looking at SQL Azure I used the SQLCMD command line utility to execute my SQL commands. Thankfully now you can download the November CTP of SQL Server 2008 R2 and from there connect to SQL Azure through SQL Server Management Studio.

Below is a screenshot of the connection dialog where I connect to my SQL Azure database. Only SQL Server Authentication is supported. You need to type in your login in the format username@server. Also it is a good idea to click on Options and on the Connection Properties tab select your desired database in the Connect to database drop down list.

 

Scripting

Typically you develop databases locally in SQL Server 2008 and then deploy the SQL Azure. As mentioned earlier, SQL Azure doesn't support typical DBA features like filegroups settings. That means you need to limit the scripting options when copying objects from a local SQL Server 2008 database to SQL Azure. Luckily there is a new settings in the November CTP of SQL Server 2008 R2 where you can choose database engine type in the Generate Scripts... wizard. See the screenshot below:

By setting "Script for the database engine type" to SQL Azure Database, you exclude the features that are not compatible with SQL Azure.

 

Copying data

If developing in a local SQL Server 2008 environment, you may want to copy data between your environment and SQL Azure. Basically there are two ways in Management Studio to do this.

Either you can use the Generate Scripts... wizard and through the Types of data to script option make it generate INSERT STATEMENTS for your data.

My preferred way however is to use the Import and Export data wizard. That wizard build an SSIS package in the background and executes to copy your data. It supports a variety of data sources like SSIS does.

The key to use the Import and Export data wizard is to use the right database provider. Currently it seems that SQL Azure only works with the .Net Framework Data Provider for SqlServer and not with the default provider suggested by the wizard. Below is a screenshot of my settings.

 

Encryption

Encryption is mandatory in all connections to SQL Azure. Usually you do not need to worry about it as it is automatically set by the server upon connecting.

 

Connection closing

The save resources, SQL Azure closes connections after they have been idle for some time. In Management Studio, this is not a big problem. You can just retry your queries. However it causes more problems in your applications.

Here is my theory what happens. By default, Windows and web applications use connection pooling to improve performance. Connection pooling means that connections are reused instead of being opened and closed. When your application closes a connection, it actually stays open in the connection pool. When it's been idle long enough it get's killed by SQL Azure. Guess what happens when your application tries to connect? Yes, it gets the save connection that's dead.

Your could either disable connection pooling or add try/catch error handling to retry your queries when they fail. I have successfully used the try/catch method to get rid of all problems. That should be the preferred method since disabling connection pooling would hurt your performance much more.

 

Collations

There is no way to set collation on database level. I've tried! Upon creation, the databases are set to SQL_LATIN1_GENERAL_CP1_CI_AS. For people like me who write applications in other languages than English, this is an annoyance.

What to do? The good news is that SQL Azure supports collations on column level even if I'd prefer a global setting on database level instead. Also you can do a collation cast in your queries, for example:

SELECT LastName
FROM MyTable
ORDER BY LastName COLLATE Finnish_Swedish_CI_AS ASC

 

Final words

The core of SQL Azure seems to be relatively complete, but it still lacks quite a bit in the user interface support. That makes it useful for experienced SQL Server users. However, I think there soon will be much better support from SQL Server 2008 R2 Management Studio.

SQL Azure is one of three cloud platforms from Microsoft. The other two are Windows Azure and the AppFabric. I've developed a Windows Azure application that uses my SQL Azure database. Doing that I missed the possibility to host them in the same Affinity Group. That should be a key requirement to get optimal performance from your cloud applications.

 

Posted: Dec 20 2009, 06:34 AM by jahlen | with 2 comment(s)
Filed under: ,
Shed some light on your data with SilverLight

This week I've been experimenting some with SilverLight. It could be a really interesting combination with the Windows Azure platform. Or why not use for displaying real-time data streams from SQL Server StreamInsight?

Anyway I ended up doing a simple pool game simulator. Quite far away from my idea of visualising data, but I do things for fun. I've published the source code at CodePlex so you can see how it works. Making a realistic simulation of the balls physics is not trivial.

Maybe this could be made into a real game? Would be cool to have it communicate with the Azure platform, and not only something simple such as storing a high score-list.

Why SilverLight is interesting to me:

  • Good development tools - Visual Studio and Expression Blend
  • Support for WCF (Windows Communicatinos Foundation), SOAP, REST, etc makes data access and communications easy
  • Supported by most web browsers
  • Easy to accomplish things with a few lines of code

So why not use SilverLight to shed some light on your data?

Resources

Posted: Dec 04 2009, 04:06 PM by jahlen | with 1 comment(s)
Filed under: ,
InstantCube - building a SQL Server Modeling DSL application for quick cube development

Introduction

This article describes how to build a tool, InstantCube, that uses SQL Server Modeling Services that drastically reduces the development time of simple Analysis Services cubes. Using the tool you can accomplish with a few lines of DSL (Domain Specific Language) what would take many times the effort to do manually in SQL Server Analysis Services and SQL Server Database Engine. InstantCube is not intended as a full-featured tool, but serves as a sample of how to speed up development tasks using SQL Server Modeling DSLs.

The article is divided in the following sections:

  1. Background - about SQL Server Modeling Services and DSLs
  2. Step by step how to implement a DSL that emits output to SQL Server and Analysis Services
  3. Results and conclusions

Background

SQL Server Modeling Services is a platform for model driven development that was formerly known as codename "Oslo". During the Microsoft PDC 2009 conference, the new name was revealed and a new CTP was published. Basically SQL Server Modeling consists of:

  • The "M" Language - including the MGrammar for authoring DSLs (Domain Specific Languages)
  • Quadrant - a feature-rich tool for viewing and editing SQL data
  • Intellipad - an editor for M-files or you custom DSLs (with syntax highlighting)
  • Repository - a managed storage (using a SQL Server database)

This article will focus on the DSL capabilities of SQL Server Modeling Services. It could actually be used to author languages for totally different subjects than SQL data. That people see MGrammar in a broader context was clearly visible from the comments on the blogs when the new name was announced.

If you have worked with compiler development, you probably have met Lex (Lexical analyser) and Yacc (Yet Another Compiler Compiler). A GNU version of yacc was given the name Bison. The DSL support in SQL Server Modeling has many similarities with Lex/Yacc, but it doesn't generate any code. Instead it produces MGraph, XAML or is accessed programmatically.

The purpose of this article is to show how to build a simple tool for quick development of Analysis Services cubes (or rather Analysis Services projects), with automatic generation of the underlying SQL and sample data. The advantage of using a DSL can be explained by an example such as adding a dimension to your cubes:

Option 1: Procedure for adding a dimension using InstantCube

 

Option 2: Typical procedure for adding a dimension manually

With a DSL you can express with one definition what would require more than one development environment to implement manually. If InstantCube also generated Integration Services packages, the potential usage and time savings would be even higher. However, InstantCube, intends only to be an example how to speed up development using DSLs.

Below is a step by step instruction how InstantCube was developed. By the end there are some screenshots of the application and the resulting output.

Step by step example - developing InstantCube

The next sections contain a step by step description how the InstantCube application was built. It starts with the design of the parser and ends with the design of the emitters that generate the cubes (or rather a whole Analysis Services project) and underlying SQL code.

Developing a DSL grammar using Intellipad

My first step was to build a sample file in my intended DSL (Domain Specific Language). I decided the language should have elements such as:

  • Cubes
  • Measures (part of a cube)
  • Sample data (definition of random data generation)
  • Dimensions
  • Attributes (part of a dimension) with Hierarchies, Attribute Relationships and Initial Values
  • Special settings for the time dimension so it automatically populates with a date range
  • Connections
  • Namings (some general settings for the naming conventions)
  • Default types (for simplicity so that you don't need to specify everywhere)
  • Comments (should start with // or be enclosed within /* and */)

I wrote my first lines in my intended language and came up with:

// Declare a project
Project ICDemo
{
   // Declare connections. Must contain two connections, one for OLEDB and one for SSAS
   Connections
   {
      // Feel free to replace localhost by any computer name
      OLEDB: "Provider=SQLOLEDB;Data Source=localhost;Integrated Security=SSPI;";
      SSAS: "localhost";
   }
}

I saved this to a file named Sample.ic (where I chose the extension .ic to represent InstantCube). The file itself was created in Notepad, but any plain text editor (including Intellipad) would have done.

The next step was to start creating the DSL grammar. This is best to do within Intellipad, since you get syntax highlighting and error checking while you type. Intellipad is a tool that comes with the SQL Server Modeling CTP.

The screenshot below shows Intellipad with my grammar, InstantCube.mg, the sample file and the output (will be discussed later). It is very convenient to work with split windows like this. To go from a single grammar windows to split windows (with input and output), press Ctrl-Shift-T.

The InstantCube language is made case insensitive by adding the directive @{CaseInsensitive{}} in front of language. It ignores whitespace and comments by the definition of an interleave.

There are two kinds of rules in MGrammar:

  • Token rules
  • Syntax rules

Token rules represents the lexical structure (the lowest level building blocks) of the language such as keywords, literals, delimiters, etc. Syntax rules represents grammatical structure such as statements, assignments, declarations, etc found in most languages.

Examples of some token rules for InstantCube:

@{Classification["Keyword"]} final token TProject = "Project";
@{Classification["Identifier"]} token TIdentifier = Language.Grammar.Identifier | ("[" Language.Grammar.Identifier "]");

The final keyword makes sure Project is a reserved word and can never be used as an identifier. The identifier token is defined to optionally be enclosed in brackets ("[" and "]").

Examples  of some syntax rules:

syntax Connections = TConnections TLeftBrace cs:Connection+ TRightBrace => Connections[valuesof(cs)];

The meaning of this rule definition is that it starts with keyword Connection, and then contains one or more connection definitions between curly braces. To the right of => is the output definition that controls the output representation. The meaning of the + is that the syntax element (in this case Connection) has to be there one or more times.

One final word about the output. What you get is an MGraph tree (or actually a Labeled Directed Graph) that can either be output to a text format or accessed programmatically. More on that in the next section.

To learn more about MGrammar you can visit the MSDN Data Developer Center.

Working with the grammar

There are basically two ways of working with your DSL:

  • The toolset way, which means translating your input files to MGraph and then storing them in the Repository.
  • The programmatic way, which means that you write your own application in C# (or any other .NET language) that does anything with the input.

In the case of InstantCube, the purpose is to generate a cube, so it is obviously the programmatic way that is to go.

To be able to use the InstantCube.mg grammar, it needs to be compiled. This can be done with the tool m.exe or programmatically. I chose to use the tool. It is simple to use - you just specify the input file and optionally the name of the output file (by default it will otherwise take the name of the input file but replace the extension by .mx). The screenshot below shows an example how to compile.

If working the toolset way, you may look further into m.exe and mgx.exe:

  • m.exe can be used for putting .m files into the Repository or translating them to SQL.
  • mgx.exe is available for executing your compiled grammar on an input file and producing either an MGraph or XAML output. The XAML is a format that has nothing to do with WPF or WF, but it can be useful if you want to access the output as an XML document.

The programmatic way is described later.

Creating the Visual Studio project

The next step is to start developing the Visual Studio application. To be able to access your input files programmatically, you need to add the necessary references. You need to add references to Microsoft.M.dll, System.DataFlow.dll and Xaml.dll. They can usually be found in C:\Program Files\Microsoft Oslo\1.0\bin or similar place. The two screenshots below shows how to add them and the resulting Visual Studio project.

Also I added a pre-build event to compile the InstantCube.mg grammar file, which basically executes m.exe. See screenshot below:

For easy access to the compiled grammar, I chose to embed it in the project. Choose add existing, add the InstantCube.mx file (from the path of the pre-build event) and set its build action to Embedded Resource (through the properties window).

Implementing the programmatic access in C#

Now that we have the parser complete, it is time to write the programmatic access layer that will transform the input into a bunch of useful classes that later can be used to produce the cube. I chose to create classes with names similar to the elements (cube, dimension, etc) and applicable properties. The root element in the InstantCube language is a project, so the corresponding class is called Project. The screenshot below shows the class.

The class uses a number of Extension Methods to read from the parsed input. In practise the steps for programmatic access are:

  1. Read your compiled grammar into an MImage object
  2. Create the parser
  3. Specify the type of GraphBuilder for your access

The following lines of code accomplish this. Note that ResourceReader is my own class which simply reads any embedded file.

var mxStream = ResourceReader.ReadResource("InstantCube.mx");
Grammar =
new MImage(mxStream, true
);
DynamicParser = Grammar.ParserFactories[
"InstantCube.InstantCube"
].Create();
DynamicParser.GraphBuilder =
new NodeGraphBuilder();

The parser can either read a file or a text input through the StringTextStream class. Parse errors can be reported if you create a class that derives from ErrorReporter. Using the parser and obtaining a reference to the root node is accomplished with the following lines of code.

var ts = new StringTextStream(doc);
var errorReporter = new ParserErrorReporter
();
var root = (Node)DynamicParser.Parse(ts, errorReporter);

ParserErrorReporter is my own class that derives from ErrorReporter.

To get the value of a node, use node.AtomicValue. The Brand property indicates the label (project, connection, cube and so on). Child nodes can be iterated by calling node.ViewAllNodes(). Since the child node collection supports IEnumerable<Node>, you can use LINQ. For example, searching for all child nodes of a specific brand can be written:

var childNodes = from n in node.ViewAllNodes()
                 where string.Compare(n.Brand.Text, brand, true
) == 0
                 select n; 

I've put the most common tasks in ExtensionMethods.cs. I've also added constructors to all classes that build them for the nodes, such as the constructor for Project. A specific exception, ModelErrorException, has been created to capture better information about errors during the interpretation of the nodes.

With the input file transformed into a useful bunch of classes, it is time to generate the output! That is the purpose of the next section.

Developing the SQL and AMO emitters

The purpose of InstantCube is to generate an Analysis Services project, an SQL database and sample data.

  • The SQL database and is created through T-SQL statements that are executed through an OLEDB Connection. The connection is specified through the Connection element of the input. IF EXISTS() conditions are included to drop any previous objects. The database itself is created if it does not exist. A number of methods such as EmitDatabase() and EmitFactTables() are exposed through the SQLEmitter class.
  • The Sample data is also created through T-SQL statements. It is implemented as two methods, EmitDimensionData() and EmitFactSampleData() of the SQLEmitter class.
  • The Analysis Services project is created through the AMO (Analysis Management Objects) API. The AMO is a library of objects to programatically manage a running instance of Analysis Services. To use AMO, add a reference to the Analysis Management Objects assembly and refer to it using namespace Microsoft.AnalysisServices. The documentation of AMO is sometimes sparse, but scripting existing Analysis Services projects to XML/A gives good clues since the AMO is similar to the XML/A model. A number of methods such as EmitDSV() and EmitCubes() have been implemented in the SSASEmitter class.

Some problems with the SSASEmitter arise from the mixture of different data representations:

  • DSVs (Data Source Views) rely on the ADO.NET DataTable concept, which uses CLR types
  • You also need to provide OLEDB types.

The translation from T-SQL to these types is implemented in two methods: DetermineType() and DetermineOleDbType(). The translation is in no way complete, as there can be much variation in the T-SQL type definitions.

Another thing that requires special handling is the Time Dimension, where the type of the attributes need to be set to the date parts (day, month, year, etc).

The resulting emitters can be found in the project source code.

Conclusion

SQL Server Modeling has support for developing DSLs (domain specific languages) through the language MGrammar. Using MGrammar and the corresponding tools you can validate and parse textual input into MGraph, XAML or programmatic access. For best flexibility and keeping the processing in memory, InstantCube uses programmatic access. Authoring the DSL specification is easiest done in Intellipad. It is also a good environment for testing sample input files against the DSL and observing their output in MGraph format.

Although there is very much to improve, InstantCube has with relatively little effort become a useful tool for quickly generating simple demo cubes with maybe a 90% decrease in effort compared to build the Analysis Services project, the SQL database and the sample data. It serves well as an example of how to speed up development using DSLs.

Below is a screenshot of the resulting InstantCube application.

Below are two screenshots of the resulting output from InstantCube.

Resources

 

SQL Server Modeling - the new name for Oslo

As expected, Microsoft has announced the real name for the Oslo platform during the PDC. It has been named SQL Server Modeling.

They also have announced they will release a new CTP, called SQL Server Modeling CTP. However the components ("M", "Quadrant" and "Repository") still haven't got their real names yet.

It is also time to update all links from the Oslo Developer center to Data Platform Developer center instead. That's where they will move SQL Server Modelling (although it is still not there as of the time I write this article).

From the comments so far I have sensed a big disappointment with putting "SQL Server" into the product name. People think it means it will be tied too closely with SQL Server, while the concept behind Oslo should be independent of SQL. For instance the MGrammar can be used to develop languages completely independent of SQL.

I will wait and see what the name change means in practise, such as licensing and pricing. Maybe some more answers will come up during the PDC.

More information:
http://www.douglaspurdy.com/2009/11/10/from-oslo-to-sql-server-modeling/
http://www.douglaspurdy.com/2009/11/12/on-dsls-and-a-few-other-things/
http://social.msdn.microsoft.com/Forums/en-US/oslo/threads/

 

Nyhetskoll - my contribution to the Windows Azure Developer Challenge

I submitted today my contribution to the Swedish Windows Azure Developer Challenge. It's an application, Nyhetskoll, where you can automatically monitor the Swedish newssites for keywords (such as persons, companies or your interests).

The application runs on Windows Azure and it uses SQL Azure för storage and query logics.

I started writing the application last weekend and after having spent a few evening finishing it in time for the deadline, today at 12:00, I was disappointed that they extended the deadline by two months. So much for my hurry!

Well, it made me really explore the Windows Azure platform. It looks promising for the future, but still I would say it is far from stable. Several times when I deployed it got stuck in the "Initializing" phase. It will be very interesting to see what happens at the official release the 19th of November during the PDC.

Feel free to visit Nyhetskoll at:
http://nyhetskoll.cloudapp.net

 

Article published on SQLServerCentral.com

Welcome to read my article, Improving Cube Performance with Precalculated Aggregations, on SQLServerCentral.com.

http://www.sqlservercentral.com/articles/SSAS/68309/

More articles will be coming when I have the time to write.

PowerPivot announced as real name for project Gemini

Microsoft just announced the real name for the BI Self Service component (project "Gemini") of SQL Server 2008 R2. According to the new web site, PowerPivot is:

"PowerPivot provides business users with BI at their fingertips while enabling organizations to efficiently monitor and manage collaboration using Microsoft SharePoint 2010 and SQL Server 2008 based management tools. "

So far there is only very brief information about PowerPivot available. I look forward to a CTP where I can try it myself. However I believe PowerPivot will be a really big step forward for the Microsoft Business Intelligence suite, changing the way to work with Microsoft BI and enabling end-users to explore data much more easily. It will further shift the power of Business Intelligence from IT to business units.

Links

99 bottles of beer

The website 99 bottles of beer shows how to write the lyrics of the song, with the same name, in about 1300 different programming languages. I've now made a contribution written in SQL. Here's the lyrics of the song:

99 bottles of beer on the wall, 99 bottles of beer.
Take one down and pass it around, 98 bottles of beer on the wall.
...
[Yet another 98 verses]
...
No more bottles of beer on the wall, no more bottles of beer.
Go to the store and buy some more, 99 bottles of beer on the wall.

99 bottles of beer has grown into a phenomena. Basically you find three categories of example on the internet for new languages: Hello World-programs, Quines (programs printing their own source code) and 99 bottles of beer-programs. It's amazing to see how much effort has been spent on some examples on the 99 bottles of beer website. Challenge: can you write a Quine in SQL?

One of my personal favorites from 99 bottles of beer is Shakespeare, a programming language with a "poetic freedom". Programs are divided in Acts and Scenes. The characters in the play are variables. If you want to assign a character, let's say Hamlet, a negative value, you put him and another character on the stage and let that character insult Hamlet. Input and output is done be having someone tell a character to listen their heart and speak their mind. Why make things simple when you can make them beautifully complicated? ;-)

Cheers - it's friday!

Posted: Oct 09 2009, 03:45 PM by jahlen | with no comments
Filed under: , ,
Installing Reporting Services on Windows 7, Vista or Windows Server 2008

I recently had to reinstall Reporting Services 2008, and had to remind myself how to set it up correctly. Installing Reporting Services on Vista, and later operating systems, can be tricky. I've seen several forum posts of people spending days on installation problems... The culprit is the new security features in Vista. I really think Microsoft should have added a README-file telling how to avoid these problems, especially since they must have known about them already when they released SQL Server 2008. This is a simple quick-guide with a step-by-step guide how to install SQL Server Reporting Services 2008 on a local machine running Vista or later operating systems.

Key Prerequisites

In order to use Reporting Services you need to:

  • Disable Internet Explorer protected mode
  • Make sure Internet Explorer provides Reporting Services with your current credentials
  • Add your account to Reporting Services server roles
  • When developing reports, run BIDS/Visual Studio as administrator

Protected mode is a security feature of Internet Explorer 7 and upwards, that uses new security features of Windows Vista to run in a "low integrity" security context. To enable Reporting Services, you can completely disable Protected mode (not recommended) or add Reporting Services to your trusted sites (recommended and described below).

Adding your account to the server roles is necessary if you do not want to run as administrator every time you start Internet Explorer.

BIDS/Visual Studio requires administrative privileges when communicating with Reporting Services. There seems to be no workaround, but running Visual Studio as administrator.

Below is a step-by-step description how to configure Reporting Services.

Disabling Internet Explorer protected mode for Report Manager

Right-click on Internet Explorer and choose Run as administrator:


In Internet Explorer, open the Internet Options and go to the Security tab. Select Trusted sites. Click on the Sites button:


Uncheck the Require server verification checkbox. Add http://localhost to the Websites list:


Close the options windows, but do not exit Internet Explorer.

Configuring Report Manager

Start Report Manager by going to http://localhost/reports. Click Properties. Click on New Role Assignment and add your account as Content Manager. Your account name should be in the form <computername>\<username> (for example Mini-laptop\Johan):


Also click on Site Settings, and then on Security. Add you account as System Administrator:


Trouble with these steps? Ensure that you are running Internet Explorer as administrator. Ensure that the necessary services are running - "SQL Server Reporting Services" and "SQL Server".

Windows 7

On Windows 7 it seems that Internet Explorer by default does not provide current user credentials to Reporting Services. What happens then is that you get a login prompt when you try to access Reporting Services. If you get a login prompt, you need to adjust security settings.

Choose Intenet Options and go to the Security tab. Click on Trusted sites. Either drag the security slider to low security or click on Custom level. If you click on Custom level, go to the User Authentication/Logon option and choose "Automatic logon with current name and password".

Developing Reports

Right click on SQL Server Business Intelligence Development Studio. Choose Run as administrator:


Running without administrative privileges will usually cause it to hang when contacting the Report Server. If you want to not have to choose Run as administrator every time, you can change properties on Visual Studio as shown below.

 

A first look at SQL Server 2008 R2 - StreamInsight

Background

A couple of weeks ago, the second CTP for SQL Server 2008 R2 was released. I downloaded it immediately. Still the product is far from finished, and the CTP didn't contain any preview of the "Gemini" component, but I could at least test the StreamInsight part. The finished product is scheduled to be released during the first half of year 2010, according to Microsoft's web site. [Notice: This article has been updated for the November CTP of StreamInsight]

What's new in SQL Server 2008 R2? I would say the most important news are:

  • StreamInsight
  • Managed Self Service Business Intelligence (the so called "Gemini" project)
  • Application and Multi-Server Management

This blog post will focus on StreamInsight.

StreamInsight

What is StreamInsight and what is it good for? StreamInsight is a platform for developing and deploying applications that handle high-speed streaming data. It could be used for near real-time processing of data from production environments, structured data such as financial information and unstructured data such as Facebook, Twitter and blogs. Multiple sources can be combined and refined before they are being output. This technology is called CEP - complex event processing.

Combine StreamInsight with data mining, and you can have real-time fraud detection, stock market prediction, you name it... Imagine a real-time Business Intelligence-application where the management can actually see the KPI gauges moving on the screen. I would say that monitoring real-time flow of information is one of the key success factors for tomorrow's Business Intelligence. This is also supported by Ralph Kimball, saying in an interview that Business Intelligence is moving from the strategic level towards operational level processes such as customer support, logistics and sales. At the operational level data must be continuously updated, not just once per day. I would add also that the new generation, that has grown up with Facebook and Twitter, will make it necessary to monitor new sources for successful Business Intelligence.

How does it work?

You develop your application in Visual Studio 2008, using libraries from StreamInsight. Basically you develop (or use existing) input and output adapters and connect them through LINQ queries. The picture below (copied from Microsoft's StreamInsight documentation) shows a good overview of the product:

(Click on image to enlarge)

There are three different development models supported by StreamInsight:

  • Explicit model, which provides full control and access to the CEP-server.
  • Implicit model, which can handle only a single query and runs all in memory. Good for debugging.
  • IObservable/IObserver interface, which hides much of the complexity of the CEP-server API

All data in StreamInsight is organized into event streams. The events are of two kinds:

  • INSERT events, that contains a header with timestamp information and a payload with application specific data. These events can be of different models, such as Interval, Point and Edge, to support different temporal characteristics.
  • CTI events, that indicates the completeness of previous events in the stream.

Adapters can be of two types:

  • Untyped adapters that work with multiple types of INSERT events
  • Typed adapters that work with a single type of INSERT events

Queries are written in LINQ with the possibility to do calculations, filtering, grouping, windowing, aggregations, combining and invoke user-defined functions. A query template is bound to specific input and output adapters to produce a query instance.

An example - building an input adapter for Yahoo Finance stock quotes

For testing StreamInsight, I chose to build an input adapter that reads stock quotes from Yahoo Finance. Note however that quotes from Yahoo Finance are delayed at least 15 minutes unless you pay for their premium service. To test the input adapter I also built an output adapter the displays the quotes on the screen (just a simple console application). These I joined together in an application that I built using the implicit development model.

The point here is not the functionality of the application, but that you could easily replace the output adapter with something much more useful. By adding SQL Server 2008 Data Mining, using any algorithm for predictive analysis, you could build something much more useful. Adding an output adapter that places buy and sell orders into a trading system could make you rich. Maybe I will write about that in a later blog, maybe not...

The picture below shows the Visual Studio 2008 project.

(Click on image to enlarge)

ScreenScraper is a helper class that reads the HTML contents of a web page and extracts a string using pattern matching (a regular expression). YahooFinance is a helper class that provides the ScreenScraper with suitable URL and pattern matching expression for the desired stock symbol. StockPayload contains the actual data and is defined as below:

public class StockPayload
{
   ///
   /// Unique ID of stock or index
   /// 
   public string StockID { get; set; }

   ///
   /// Current value of stock or index
   /// 
   public double Value { get; set; }
}

The data members can only be basic types such as numbers, strings, etc. This is probably because the serialization mechanism in StreamInsight requires a fixed size payload.

The initial rows of the input adapter is shown below.

(Click on image to enlarge)

Note that the input adapter inherits generic class TypedPointInputAdapter, which is used for typed (single kind of event) point (point in time events) adapters. Basically the adapter has to implement methods for Start() and Resume(), monitor the AdapterState and enqueue events to the output stream. The input adapter can work with asynchronous sources through callback functions, and synchronous sources. I've implemented the input adapter as a loop that reads data from Yahoo Finance and then waits for a determined time. A CTI event is sent after each quote, to indicate there are no other events awaiting.

A factory class is provided to instantiate the adapters. The factory implements a Create method that takes a configuration and an event shape as parameter. Based on these parameters it instantiates the input adapter.

The output adapter is built similarly as the input adapter, so I don't describe it here. The adapters are tied together by a query defined by the following statements. Note that this solution is specific for the implicit development model. In the explicit model, the query would instead be bound through the CreateQuery method of the application.

// Join input adapters with a simple LINQ query
var combinedInputStream = (from e in input1Stream
   select e).Union
         (from e in input2Stream
   select e);

// Connect input adapters with output adapter
var query = combinedInputStream.ToQuery(outputSink);

Finally the query is being run:

// Run the query
query.Start();
adapterStopSignal.WaitOne();
query.Stop();

Testing and debugging

SQL Server 2008 R2 StreamInsight comes with a tool, StreamInsight Debugger, that can visualize a recorded event flow. To use it you first need to write tracing data to a file. Enable tracing, run your application, and the disable tracing. This is done from command-line as shown below:

Opening the trace data file, sample.etl, in StreamInsight Debugger you can monitor the stream pipeline. The picture below shows data from the input adapter instances and how they are unioned in the LINQ query.

(Click on image to enlarge)

You even have an option in StreamInsight to replay the sequence so you can watch it like a movie. To do that you click on the clock icon in the toolbar.

Summary

Working with near real-time data is very different from traditional database development such as data warehouses, which are sluggish and unfit for real-time data. Therefore a new approach is required. StreamInsight looks promising as a technology for monitoring and responding to new data sources such as production systems, Facebook, Twitter and blogs.

Source code

The source code for my sample application, including the input adapter for Yahoo Finance, is available on the following link:

Unleashing the power of SQL Server ENTERPRISE EDITION - Part 1

Background

Having seen a number of MS SQL Server Business Intelligence implementations, I have noticed that surprisingly few consultants know the difference between SQL Server Enterprise Edition and SQL Server Standard Edition, and even fewer use the Enterprise Edition features. I think this may be partly because of a lack of good examples available, and therefore decided to try to write at least one example.

Below is an extract from a comparison sheet between Standard and Enterprise edition:

This article will focus on the usage of semi-additive measures.

The Inventory problem

The Inventory problem is a typical example of a semi-additive measure. Imagine that you have an ERP-system that tracks inventory, recording every time an item is in- or out of the inventory. From the ERP-system you get a daily net-change for each item as shown below:

However, what the management is interested in is not the net changes, but the quantity available at the end of every day:

When looking at a specific day, month or other period, the quantity available is the sum of all net changes from beginning of time to the end of the specific period. This is a closing balance.

The definition of a semi-additive measure is that it behaves differently in different dimensions. A closing balance is semi-additive because it behaves differently in the time-dimension than other dimensions. Other examples of semi-additive measures are: opening balance, average and by account.

An example - Balance Sheet

Another situation, apart from inventories, where you want to work with opening and closing balances is Balance Sheets. When extracting general ledger data from an ERP-system you will get the net transactions. However, financial people are used to see opening balances, net changes and closing balances. This example will show how to produce closing balances, although opening balances can be produced very easily by small modifications to this example.

Consider the following star-schema for describing the Balance Sheet:

The fact-table is filled with transactional data (not necessarily in time-order):

The Day_Key is expressed in ISO-format (YYYYMMDD), and the Month_Key and Year_Key are made up by simply removing lower parts:

Dim_Account is a very simplified chart of accounts (based on EU-standards):

Natural keys are used in the relation between Fact_GL and the dimension tables, whereas in the real-world surrogate keys should be used instead.

SQL-scripts for setting up this demo environment can be found here:

The SQL-scripts assume that you have created a database named PRECALCDB that you use to run them.

Calculating the closing balances

Calculating the closing balances can be done in two ways: calculating them in dynamically in the OLAP Cube, or precalculating them in the relational database. We will try both in this example.

To precalculate the closing balances in the relational database, we create a new table to hold the calculated values, Fact_GL_CB, the is similar to the previous fact-table but contains field Amount_CB instead of Amount and also has an extra Date-file (not really necessary, but simplifies a bit):

We now need to fill Fact_GL_CB with closing balances. It should have a value for every combination of Company, Account and Day_Key, that is the sum of all amounts having a date less than or equal to the Day_Key. This can be accomplished using while-loops, cursors, etc. However I prefer writing a recursive SQL-query instead. My solution is in a few steps:

Initialise a range-table, telling the recursive query which dates to handle:

TRUNCATE TABLE Fact_GL_CB;

SELECT DISTINCT
   Company_Key,
   Account_Key,
   (SELECT MIN([Date]) FROM Fact_GL) AS [Date],
   (SELECT MAX([Date]) FROM Fact_GL) AS [MaxDate]
INTO #Range
FROM Fact_GL
WHERE Account_Key < N'3';

Note that this step also deletes any previous data from Fact_GL_CB and filters out transactions not belonging to the Balance Sheet (having Account_Key starting with 3 or higher). Using the name #Range ensures it is written to tempdb instead of PRECALCDB.

Next step is to create a daily sums table, holding the sum of all transactions each day:

WITH
AllDays AS
(
   SELECT
      Company_Key,
      Account_Key,
      [Date],
      [MaxDate]
   FROM #Range

   UNION ALL

   SELECT
      Company_Key,
      Account_Key,
      DATEADD(DAY, 1, [Date]),
      [MaxDate]
   FROM AllDays
   WHERE [Date] < [MaxDate]
)
SELECT
   AllDays.Company_Key,
   AllDays.Account_Key,
   AllDays.[Date],
   ISNULL(SUM(Amount), 0) [Amount]
INTO #DailySums
FROM AllDays
LEFT JOIN Fact_GL ON AllDays.Company_Key = Fact_GL.Company_Key AND AllDays.Account_Key = Fact_GL.Account_Key AND AllDays.[Date] = Fact_GL.[Date]
GROUP BY AllDays.Company_Key, AllDays.Account_Key, AllDays.[Date]
OPTION (MAXRECURSION 0);

Note the option MAXRECURSION which is necessary to prevent the AllDays common table expression to hit the default recursion depth limit.

Having calculated the daily sums, the closing balances can be calculated using a recursive query that accumulates daily sums:

WITH
Accumulator AS
(
   SELECT
      Company_Key,
      Account_Key,
      [Date],
      [Amount]
   FROM #DailySums
   WHERE [Date] = (SELECT MIN([Date]) FROM #Range)

   UNION ALL

   SELECT
      #DailySums.Company_Key,
      #DailySums.Account_Key,
      #DailySums.[Date],
      #DailySums.[Amount] + Accumulator.[Amount]
   FROM #DailySums
   JOIN Accumulator
   ON
      Accumulator.Company_Key = #DailySums.Company_Key AND
      Accumulator.Account_Key = #DailySums.Account_Key AND
      DATEADD(DAY, 1, Accumulator.[Date]) = #DailySums.[Date]
)
INSERT INTO Fact_GL_CB(Amount_CB, [Date], Day_Key, Company_Key, Account_Key)
SELECT
   [Amount],
   [Date],
   CONVERT(INT, CONVERT(NVARCHAR, [Date], 112)) [Day_Key],
   Company_Key,
   Account_Key
FROM Accumulator
ORDER BY Company_Key, Account_Key, [Date]
OPTION (MAXRECURSION 0);

Finally, the performance can be improved by adding an index on #DailySums between step 2 and 3.

The complete SQL-script for the aggregation process can be found here:

Building the cube

Now we are finished with the relational database part and continue with the cube. We start an Analysis Services project in BIDS and create a Data Source View as below:

(click on image to enlarge)

The next step is to create a cube and dimensions. This is most easily done by running the New Cube wizard. Make sure you select both Fact_GL and Fact_GL_CB as measure group tables. Select the dimensions. Be ready to make some adjustments to the dimensions (the wizard isn't too clever).

We also need to do minor adjustments to the measures. Change the format so that they are displayed as Currency or similar. Rename "Amount CB" to "Amount CB Precalc" and change the aggregation function to LastNonEmpty:

(click on image to enlarge)

LastNonEmpty is basically the equivalent of a closing balance. Another option would be to use LastChild, which differs only in the handling of missing values.

Also we add a calculated measure, "Amount CB Cube", which dynamically calculates the closing balance in the cube (no need for any Fact_GL_CB table). This measure can easily be created as follows:

(click on image to enlarge)

The full Analysis Services-project can be found here:

Build and deploy the solution. Time to test...

Results

To evaluate the results, I have used MDX Studio by Mosha Pasumansky. This is an essential tool that any serious Microsoft OLAP-developer should have.

The following MDX-query can be used to test. Swap [Amount CB Cube] for [Amount CB Precalc] to test the precalculated values instead of the dynamic values.

SELECT
   {Account.Assets.CHILDREN, Account.Liabilities.CHILDREN} *
   {[Amount], [Amount CB Cube]} ON COLUMNS,
   [Time].Day.MEMBERS ON ROWS
FROM PRECALCDB

Using MDX Studio to profile this query gives us the following performance for dynamically calculated closing balances:

(click on image to enlarge)

Testing the precalculated closing balances gives us this performance:

(click on image to enlarge)

As you can see, the precalculated closing balances are about 10 times faster when not cached. When cached the performance difference is insignificant, as could be expected since there is no need then to do any calculations.

However, the performance is not the only reason for using precalculations. It may not even be the most important reason. Calculating the closing balance dynamically in the cube (using a calculated measure) has several disadvantages, for example:

  • Drill-through actions will not work
  • Problems may arise with solve-order
Therefore the use of the Enterprise Edition feature, semi-additive measures, is much preferred when possible. Especially in the case that you already have closing balances or opening balances from the data source. 

Source code

More Posts Next page »