Johan Åhlén

Johan Åhlén's blog about life, universe and everything.

InstantCube - building a SQL Server Modeling DSL application for quick cube development

Introduction

This article describes how to build a tool, InstantCube, that uses SQL Server Modeling Services that drastically reduces the development time of simple Analysis Services cubes. Using the tool you can accomplish with a few lines of DSL (Domain Specific Language) what would take many times the effort to do manually in SQL Server Analysis Services and SQL Server Database Engine. InstantCube is not intended as a full-featured tool, but serves as a sample of how to speed up development tasks using SQL Server Modeling DSLs.

The article is divided in the following sections:

  1. Background - about SQL Server Modeling Services and DSLs
  2. Step by step how to implement a DSL that emits output to SQL Server and Analysis Services
  3. Results and conclusions

Background

SQL Server Modeling Services is a platform for model driven development that was formerly known as codename "Oslo". During the Microsoft PDC 2009 conference, the new name was revealed and a new CTP was published. Basically SQL Server Modeling consists of:

  • The "M" Language - including the MGrammar for authoring DSLs (Domain Specific Languages)
  • Quadrant - a feature-rich tool for viewing and editing SQL data
  • Intellipad - an editor for M-files or you custom DSLs (with syntax highlighting)
  • Repository - a managed storage (using a SQL Server database)

This article will focus on the DSL capabilities of SQL Server Modeling Services. It could actually be used to author languages for totally different subjects than SQL data. That people see MGrammar in a broader context was clearly visible from the comments on the blogs when the new name was announced.

If you have worked with compiler development, you probably have met Lex (Lexical analyser) and Yacc (Yet Another Compiler Compiler). A GNU version of yacc was given the name Bison. The DSL support in SQL Server Modeling has many similarities with Lex/Yacc, but it doesn't generate any code. Instead it produces MGraph, XAML or is accessed programmatically.

The purpose of this article is to show how to build a simple tool for quick development of Analysis Services cubes (or rather Analysis Services projects), with automatic generation of the underlying SQL and sample data. The advantage of using a DSL can be explained by an example such as adding a dimension to your cubes:

Option 1: Procedure for adding a dimension using InstantCube

 

Option 2: Typical procedure for adding a dimension manually

With a DSL you can express with one definition what would require more than one development environment to implement manually. If InstantCube also generated Integration Services packages, the potential usage and time savings would be even higher. However, InstantCube, intends only to be an example how to speed up development using DSLs.

Below is a step by step instruction how InstantCube was developed. By the end there are some screenshots of the application and the resulting output.

Step by step example - developing InstantCube

The next sections contain a step by step description how the InstantCube application was built. It starts with the design of the parser and ends with the design of the emitters that generate the cubes (or rather a whole Analysis Services project) and underlying SQL code.

Developing a DSL grammar using Intellipad

My first step was to build a sample file in my intended DSL (Domain Specific Language). I decided the language should have elements such as:

  • Cubes
  • Measures (part of a cube)
  • Sample data (definition of random data generation)
  • Dimensions
  • Attributes (part of a dimension) with Hierarchies, Attribute Relationships and Initial Values
  • Special settings for the time dimension so it automatically populates with a date range
  • Connections
  • Namings (some general settings for the naming conventions)
  • Default types (for simplicity so that you don't need to specify everywhere)
  • Comments (should start with // or be enclosed within /* and */)

I wrote my first lines in my intended language and came up with:

// Declare a project
Project ICDemo
{
   // Declare connections. Must contain two connections, one for OLEDB and one for SSAS
   Connections
   {
      // Feel free to replace localhost by any computer name
      OLEDB: "Provider=SQLOLEDB;Data Source=localhost;Integrated Security=SSPI;";
      SSAS: "localhost";
   }
}

I saved this to a file named Sample.ic (where I chose the extension .ic to represent InstantCube). The file itself was created in Notepad, but any plain text editor (including Intellipad) would have done.

The next step was to start creating the DSL grammar. This is best to do within Intellipad, since you get syntax highlighting and error checking while you type. Intellipad is a tool that comes with the SQL Server Modeling CTP.

The screenshot below shows Intellipad with my grammar, InstantCube.mg, the sample file and the output (will be discussed later). It is very convenient to work with split windows like this. To go from a single grammar windows to split windows (with input and output), press Ctrl-Shift-T.

The InstantCube language is made case insensitive by adding the directive @{CaseInsensitive{}} in front of language. It ignores whitespace and comments by the definition of an interleave.

There are two kinds of rules in MGrammar:

  • Token rules
  • Syntax rules

Token rules represents the lexical structure (the lowest level building blocks) of the language such as keywords, literals, delimiters, etc. Syntax rules represents grammatical structure such as statements, assignments, declarations, etc found in most languages.

Examples of some token rules for InstantCube:

@{Classification["Keyword"]} final token TProject = "Project";
@{Classification["Identifier"]} token TIdentifier = Language.Grammar.Identifier | ("[" Language.Grammar.Identifier "]");

The final keyword makes sure Project is a reserved word and can never be used as an identifier. The identifier token is defined to optionally be enclosed in brackets ("[" and "]").

Examples  of some syntax rules:

syntax Connections = TConnections TLeftBrace cs:Connection+ TRightBrace => Connections[valuesof(cs)];

The meaning of this rule definition is that it starts with keyword Connection, and then contains one or more connection definitions between curly braces. To the right of => is the output definition that controls the output representation. The meaning of the + is that the syntax element (in this case Connection) has to be there one or more times.

One final word about the output. What you get is an MGraph tree (or actually a Labeled Directed Graph) that can either be output to a text format or accessed programmatically. More on that in the next section.

To learn more about MGrammar you can visit the MSDN Data Developer Center.

Working with the grammar

There are basically two ways of working with your DSL:

  • The toolset way, which means translating your input files to MGraph and then storing them in the Repository.
  • The programmatic way, which means that you write your own application in C# (or any other .NET language) that does anything with the input.

In the case of InstantCube, the purpose is to generate a cube, so it is obviously the programmatic way that is to go.

To be able to use the InstantCube.mg grammar, it needs to be compiled. This can be done with the tool m.exe or programmatically. I chose to use the tool. It is simple to use - you just specify the input file and optionally the name of the output file (by default it will otherwise take the name of the input file but replace the extension by .mx). The screenshot below shows an example how to compile.

If working the toolset way, you may look further into m.exe and mgx.exe:

  • m.exe can be used for putting .m files into the Repository or translating them to SQL.
  • mgx.exe is available for executing your compiled grammar on an input file and producing either an MGraph or XAML output. The XAML is a format that has nothing to do with WPF or WF, but it can be useful if you want to access the output as an XML document.

The programmatic way is described later.

Creating the Visual Studio project

The next step is to start developing the Visual Studio application. To be able to access your input files programmatically, you need to add the necessary references. You need to add references to Microsoft.M.dll, System.DataFlow.dll and Xaml.dll. They can usually be found in C:\Program Files\Microsoft Oslo\1.0\bin or similar place. The two screenshots below shows how to add them and the resulting Visual Studio project.

Also I added a pre-build event to compile the InstantCube.mg grammar file, which basically executes m.exe. See screenshot below:

For easy access to the compiled grammar, I chose to embed it in the project. Choose add existing, add the InstantCube.mx file (from the path of the pre-build event) and set its build action to Embedded Resource (through the properties window).

Implementing the programmatic access in C#

Now that we have the parser complete, it is time to write the programmatic access layer that will transform the input into a bunch of useful classes that later can be used to produce the cube. I chose to create classes with names similar to the elements (cube, dimension, etc) and applicable properties. The root element in the InstantCube language is a project, so the corresponding class is called Project. The screenshot below shows the class.

The class uses a number of Extension Methods to read from the parsed input. In practise the steps for programmatic access are:

  1. Read your compiled grammar into an MImage object
  2. Create the parser
  3. Specify the type of GraphBuilder for your access

The following lines of code accomplish this. Note that ResourceReader is my own class which simply reads any embedded file.

var mxStream = ResourceReader.ReadResource("InstantCube.mx");
Grammar =
new MImage(mxStream, true
);
DynamicParser = Grammar.ParserFactories[
"InstantCube.InstantCube"
].Create();
DynamicParser.GraphBuilder =
new NodeGraphBuilder();

The parser can either read a file or a text input through the StringTextStream class. Parse errors can be reported if you create a class that derives from ErrorReporter. Using the parser and obtaining a reference to the root node is accomplished with the following lines of code.

var ts = new StringTextStream(doc);
var errorReporter = new ParserErrorReporter
();
var root = (Node)DynamicParser.Parse(ts, errorReporter);

ParserErrorReporter is my own class that derives from ErrorReporter.

To get the value of a node, use node.AtomicValue. The Brand property indicates the label (project, connection, cube and so on). Child nodes can be iterated by calling node.ViewAllNodes(). Since the child node collection supports IEnumerable<Node>, you can use LINQ. For example, searching for all child nodes of a specific brand can be written:

var childNodes = from n in node.ViewAllNodes()
                 where string.Compare(n.Brand.Text, brand, true
) == 0
                 select n; 

I've put the most common tasks in ExtensionMethods.cs. I've also added constructors to all classes that build them for the nodes, such as the constructor for Project. A specific exception, ModelErrorException, has been created to capture better information about errors during the interpretation of the nodes.

With the input file transformed into a useful bunch of classes, it is time to generate the output! That is the purpose of the next section.

Developing the SQL and AMO emitters

The purpose of InstantCube is to generate an Analysis Services project, an SQL database and sample data.

  • The SQL database and is created through T-SQL statements that are executed through an OLEDB Connection. The connection is specified through the Connection element of the input. IF EXISTS() conditions are included to drop any previous objects. The database itself is created if it does not exist. A number of methods such as EmitDatabase() and EmitFactTables() are exposed through the SQLEmitter class.
  • The Sample data is also created through T-SQL statements. It is implemented as two methods, EmitDimensionData() and EmitFactSampleData() of the SQLEmitter class.
  • The Analysis Services project is created through the AMO (Analysis Management Objects) API. The AMO is a library of objects to programatically manage a running instance of Analysis Services. To use AMO, add a reference to the Analysis Management Objects assembly and refer to it using namespace Microsoft.AnalysisServices. The documentation of AMO is sometimes sparse, but scripting existing Analysis Services projects to XML/A gives good clues since the AMO is similar to the XML/A model. A number of methods such as EmitDSV() and EmitCubes() have been implemented in the SSASEmitter class.

Some problems with the SSASEmitter arise from the mixture of different data representations:

  • DSVs (Data Source Views) rely on the ADO.NET DataTable concept, which uses CLR types
  • You also need to provide OLEDB types.

The translation from T-SQL to these types is implemented in two methods: DetermineType() and DetermineOleDbType(). The translation is in no way complete, as there can be much variation in the T-SQL type definitions.

Another thing that requires special handling is the Time Dimension, where the type of the attributes need to be set to the date parts (day, month, year, etc).

The resulting emitters can be found in the project source code.

Conclusion

SQL Server Modeling has support for developing DSLs (domain specific languages) through the language MGrammar. Using MGrammar and the corresponding tools you can validate and parse textual input into MGraph, XAML or programmatic access. For best flexibility and keeping the processing in memory, InstantCube uses programmatic access. Authoring the DSL specification is easiest done in Intellipad. It is also a good environment for testing sample input files against the DSL and observing their output in MGraph format.

Although there is very much to improve, InstantCube has with relatively little effort become a useful tool for quickly generating simple demo cubes with maybe a 90% decrease in effort compared to build the Analysis Services project, the SQL database and the sample data. It serves well as an example of how to speed up development using DSLs.

Below is a screenshot of the resulting InstantCube application.

Below are two screenshots of the resulting output from InstantCube.

Resources

 

Comments

No Comments