Johan Åhlén

Johan Åhlén's blog about life, universe and everything.
My blog has moved

I have decided to move my blog to a new platform. You will now find it at http://www.joinsights.com. Old posts will remain at my old blog site http://blogical.se/blogs/jahlen.

See you!

 

Donald Farmer leaves Microsoft for QlikTech

Everybody who has worked with Microsoft Business Intelligence and haven't been living on a deserted island knows about Donald Farmer, who's personified Microsoft BI. Reading that he is leaving Microsoft is of course shocking news for us in the MS BI communities. However I believe he has good personal reasons and wish him the best of luck in his new job!

So what does this mean? I think Rob Collie has made a good analysis. Donald Farmer has been the face of the product line, but he isn't the product line. There are thousands of clever people at Microsoft that have been doing a great work in the background and that will continue doing a great work. Microsoft will continue to sell their BI products with or without Donald. On the other hand, QlikTech will benefit much from PR and credibility as well as getting a really clever guy like Donald Farmer. Since the BI market is very far from a zero sum game, Microsoft will not loose much but QlikTech will gain much more. Simple business logic!

Links:

 

SQL Server Trace Flags - why and how to use them

Trace flags are very powerful settings that allow you to customize SQL Server behaviour. Most of the trace flags are undocumented in Books Online and there does not exist any official list of them. Some have been mentioned in Knowledge Base articles and presentation from Microsoft employees making them somewhat official, but you use them at your own risk. So why would anyone use them? Because they give you some unique possibilities to do performance tuning, diagnostics and understanding how SQL Server works.

In an earlier blog post I described trace flag 3604, which enables you to explore SQL Server under the hood by providing you with a detailed dump of data pages so that you can see exactly how data is stored. You get a hex dump including an interpretation of the data.

How do you use a trace flag? There are several ways:

  • You can use the DBCC TRACEON command to enable a trace flag at session level or sever level
  • You can set the startup parameters for SQL Server through the xxx utility
  • If you start SQL Server from the command prompt (instead of as a service) you can add them as parameters

Using DBCC TRACEON 

The DBCC TRACEON command is documented in Books Online. To enable a trace flag at session level (for instance traceflag 2528) you type:

DBCC TRACEON (2528)

If you instead want to enable a trace flag at server level you add a -1 as a parameter. For example if you wish to enable trace flag 2528 at server level you type:

DBCC TRACEON (2528, -1)

Using startup parameters 

Some trace flags work only at session level, some only at server level and some work at either level. Some must be set as startup parameters to SQL Server. That can be done using the SQL Server Configuration Manager. Double-click on the SQL Server service (or right-click and choose Properties).

Switch to the Advanced tab and scroll down to Startup Parameters.

If you for example want to enable trace flags 2528 and 3205 you add the following to Startup Parameters:

-T2528;-T3205

Be careful to use an uppercase T. Using lowercase t enables other trace flags which are intended for Microsoft internal use only.

Restart SQL Server for the changes to take effect.

Also other Startup Parameters can be mentioned, such as -x which disables monitoring and performance data collection and can be used for extreme performance tuning. Another option is -E which is useful for data warehouses (where you have mostly sequential I/O access) because it makes SQL Server allocate 64 extents per data file instead of 1 extent before moving to the next file. The -E option is available on 64-bit only.

Starting SQL Server from the command prompt

SQL Server can be started manually from the command prompt. The default location of the exe-file is 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn'. If you change to that directory you can start SQL Server by typing

sqlservr.exe

If you have a named instance you should type

sqlservr.exe -sInstancename

If you want to enable any trace flags, such as 2528 you type

sqlservr.exe -T2528

Checking the status of trace flags

You can check the status of your trace flags by using the following command:

DBCC TRACESTATUS

The output is a table that tells you if a trace flags is enabled on session and/or global level.

Where to find out about trace flags

I've gathered some sources for trace flags. However as I said earlier there is no complete list anywhere, so finding out about trace flags is a bit like being an archeologist.

Microsoft's Knowledge Base articles are generally a good source of information about trace flags. So which trace flag is my favorite? It certainly is trace flag 3604 which is very useful if you want to learn about SQL Server in depth.

 

SQLug.se challenge 2010 - the Swedish national championship for SQL Server developers

The Swedish national championship for SQL Server developers, SQLug.se challenge 2010, is now finished. The competition has been running on the Swedish SQL Server User Group's website, but I publish the competition problem also on my blog in case you want to test your skills on it!

 

The problem comes from a real world scenario that I've faced as a Business Intelligence consultant, but the data is randomly generated. The goal of the competition was to make the fastest solution. All the entries and performance statistics are published on our user group's website.

 

Consider a table of meter reading from a number of vehicles. The meter readings are a running total and measure fuel consumption (in centiliters).

ID Tid VehicleName MeterValue
1 2010-02-01 08:00:00.000 LIFT1 10000
2 2010-02-01 08:05:00.000 LIFT1 10025
3 2010-02-01 08:15:00.000 LIFT1 10065
4 2010-02-01 08:15:00.000 LIFT2 20000
5 2010-02-01 08:30:00.000 LIFT2 20150
6 2010-02-01 09:05:00.000 LIFT1 10315
7 2010-02-01 09:10:00.000 LIFT2 20510
8 2010-02-01 09:55:00.000 LIFT2 20810
9 2010-02-01 10:00:00.000 LIFT1 10480
10 2010-02-01 10:00:00.000 LIFT2 20885
11 2010-02-01 10:10:00.000 LIFT1 10530
12 2010-02-01 10:15:00.000 LIFT1 10575
13 2010-02-01 11:00:00.000 LIFT1 10725
14 2010-02-01 11:05:00.000 LIFT2 21445
15 2010-02-01 11:05:00.000 LIFT1 10755
16 2010-02-01 11:50:00.000 LIFT1 11295
17 2010-02-01 12:00:00.000 LIFT1 11325

In this example LIFT1 consumed 25 cl between 08:00 and 08:05. The consumption rate then was 5 cl per minute.
 

The task is to calculate the time weighted average consumption rate, in centiliters per minute, for the 20% periods that had the highest consumption rate (round the number of periods up to closest integer so you will for example include 3 periods if there are a total of 11 periods). If more than one period has the same consumption, use the latest period.

 See example below for LIFT1. 
VehicleName StartTime EndTime Consumption TimeInMinutes ConsumptionPerMinute
LIFT1 08:00:00 08:05:00 25 5 5
LIFT1 08:05:00 08:15:00 40 10 4
LIFT1 08:15:00 09:05:00 250 50 5
LIFT1 09:05:00 10:00:00 165 55 3
LIFT1 10:00:00 10:10:00 50 10 5
LIFT1 10:10:00 10:15:00 45 5 9
LIFT1 10:15:00 11:00:00 150 45 3,33333333333333
LIFT1 11:00:00 11:05:00 30 5 6
LIFT1 11:05:00 11:50:00 540 45 12
LIFT1 11:50:00 12:00:00 30 10 3

 
We have a total of 10 periods so 20% would be two periods. The two periods with highest consumption rate had 12 cl/minute and 9 cl/minute and their durations were 45 and 5 minutes. The time weighted average consumption rate for LIFT1 would be:(12 x 45 + 9 x 5) / (45 + 5) = 11,7 cl/minute. Same calculation for LIFT2 gives an average consumption rate of 15 cl/minut. The result should be written to a table called ConsumptionSummary.
VehicleName Top20PctAveragePerMinute
LIFT1 11,7
LIFT2 15

Files

Correct results for the smaller testdata

VehicleName Top20PctAveragePerMinute
LIFT1 8202.18834892846
LIFT2

27949.9640499379

LIFT3

13310.5810794649

LIFT4

19097.1263597138

Winners of the competition were:

1

Henrik Nilsson, Brummer & Partners

9,50 s

2

Zhong Yu, B3IT

14,17 s

3

Sergey Klimkevich, Memnon Networks AB

16,20 s

4

Magnus Sörin, Avanade

35,32 s

5

Peter Larsson, Developer Workshop

58,87 s

My SQL Azure presentation from Azure Summit available here

Azure Summit in Stockholm was a two day event filled with great presentations about the Azure Platform. I think people are now beginning to understand that the cloud will change our businesses a lot! 15 years ago not many understood the real potential of internet and now we are connected everywhere. Will cloud computing lead to as dramatic changes as the internet did?

All the presentations, including mine, are available on the Azure Summit website. However they are available in Swedish only.

By the way, my presentation was voted as the winner of the speaker contest. What did I win? A balloon flight with Microsoft's Azure balloon.

Next time I present SQL Azure will be at TechDays 2011 in Örebro.

Posted: Nov 30 2010, 01:52 PM by jahlen
Filed under: ,
SQL Azure - why use it and what makes it different from SQL Server?

I've so far not seen any article on why anyone should use SQL Azure so I thought it is time someone writes about that. This article is an introduction to why you might be interested in a cloud database and a summary of the differences compared to "ordinary" SQL Server. Finally I've included links to some useful resources.

Why would you need a cloud database?

Maybe you've read the story about the unknown IT multi-millionaire Markus Persson? If not, here is an article (in Swedish only). He's not so unknown any longer but I think he never imagined what a tremendous success his game Minecraft would be. Even if he haven't spent a single dollar on marketing it became so successful that the server capacity was not sufficient for the all the new paying players.

Of course it is very hard to predict how popular an online game or service will be. It can also change dramatically if you get good publicity (or bad). How can you then make sure you always have the right server capacity for your users?

This is where Azure and the cloud comes in. It is an elastic platform which provides you the means to instantly increase or decrease your server capacity. You only pay for what you use and don't have to estimate capacity needs and buy a reserve capacity. You won't run into the problem of "how can I do database maintenance when I always have millions of players online".

Database-as-a-service vs physical databases

What kind of demands can you make on a database? Why would anyone use a database? Would it not be easier to just write data to files instead?

I can give you at least four reasons why you should in most applications use a database:

  • Availability - let's say you are running business critical applications (like in healthcare). Then you don't want to rely on a single computer or disk. SQL Server comes with built-in support for high availability such as clustering, database mirroring, etc which can be configured by a DBA. With SQL Azure you automatically get high availability with zero configuration.
  • Security - with a database you have a much finer control of security than on a file. You can also audit the usage, who has accessed what and who has updated what. Auditing has become increasingly important since the last financial crisis.
  • Correctness - you can enforce consistency on a database through foreign keys, unique indexes, check constraints, etc. Also you can use transaction isolation to prevent chaos when thousands of users work with the database simultaneously. In almost any OLTP application you need to use transactions if you want the application to work correctly with many users.
  • Performance - databases have several mechanisms to optimize performance such as caching, query optimizers, etc.

I'm not saying that the above features cannot be achived by files and custom code, but with a database you needn't reinvent the wheel every time.

As a developer you just want the database features but probably don't want to spend time setting up clustering, applying security patches, worry about disk space, etc. Those are the things a DBA does for you.

With SQL Azure you get at least three physical databases for every single database your create. That is how SQL Azure ensures you get the 99.9% uptime. Also you don't need to spend any time on security patches, disk space, moving databases between different servers to handle increased load, etc.

SQL Server Consolidation

It has become increasingly popular to consolidate SQL Server databases, because it can save you money on hardware, licenses and maintenance work.

I've seen three main methods to consolidate databases:

  • Virtualization - which simply means moving physical machines to virtual machines. The easiest way, but also gives the least savings because every virtual machine needs its own OS and occupy a lot of resources.
  • Instance consolidation - which means that you move several SQL Server instances to the same physical server and let them share the OS. Better efficiency than virtualization, but still resources are not shared between instances.
  • Database consolidation - where you merge SQL Server instances. Even more efficient than instance consolidation.

I've added SQL Azure as a method in the picture above. It takes more effort than any of the other methods because SQL Azure is a bit different from a physical database but it also gives you the biggest savings.

Some differences between SQL Server and SQL Azure

So what really are the differences between SQL Server and SQL Azure? You can find a list of unsupported features in SQL Azure here. But what are the fundamental differences?

  • In SQL Server you usually rely on Integrated Security. In SQL Azure you have to use SQL Server authentication.
  • In SQL Server an idle connection almost never goes down. If it does, there is usually a serious error. In SQL Azure an idle connection goes down after 5 minutes and can go down for a lot of other reasons than timeout. You need to think about this especially if you use connection pooling.
  • The default transaction isolation level in SQL Azure is READ_COMMITTED_SNAPSHOT while it is READ_COMMITTED in SQL Server. That can make applications behave differently.
  • There is no SQL Server Agent in SQL Azure. Therefore it does not provide any functionality that relies on the SQL Server agent, such as CDC or similar.
  • You cannot currently backup a SQL Azure database. However you can easily copy them instead. But you don't have access to transaction log backups so you won't be able to restore the database to an arbitrary state in time.
  • Heap tables are not supported in SQL Azure (except in tempdb). A heap table is a table which does not have a clustered index. This is not a big issue since clustered indexes almost always are a much better choice than heap tables.
  • You cannot write cross-database references in SQL Azure. In SQL Server you can combine data from databases in the same instance.
  • Also you cannot access linked servers in SQL Azure.
  • Default database collation in SQL Azure is SQL_LATIN_1 and cannot be changed. However Microsoft has promised support for other default database collations in the future.
  • Fulltext indexing is currently not supported in SQL Azure.

The above list may look long, but most existing applications that use SQL Server would work well also on SQL Azure. The most common issue I think will be connection pooling and handling of lost connections. The second most common issue (for people who live outside of US like me) will probably be the default database collation.

Tempdb in SQL Azure

When you start using SQL Azure you will notice that you automatically get a master database. Do you also get a tempdb database?

Yes, there is a tempdb database. However there are some limitations how you can access it. Which of the following queries do you think works?

  • CREATE TABLE tempdb.dbo.Test ( col1 int not null )
  • CREATE TABLE #Test ( col1 int not null )
  • CREATE TABLE ##Test ( col1 int not null )

Only the second option works. As mentioned earlier you cannot write cross-database references (like the first option). The third option is forbidden because you cannot create global tempdb objects.

Can you have heap tables (non clustered) in tempdb? Yes you can. Still it doesn't support the SELECT INTO statement, which I believe was disabled because it creates heap tables.

Summary

SQL Azure is very useful for applications where you don't know how many users you will have in the future, since it provides you with the ability to scale up and down on demand. Especially with the new SQL Azure Federation support (that I will write about later). Also it is of course a very useful storage platform for Windows Azure based applications.

Links and resources

SQL Azure some more tips and tricks

This article is a follow up to my earlier article with tips on SQL Azure.

Data-tier applications (DACs)

There are two main options on how to deploy your database solutions to SQL Azure, unless you want to do it manually. These options are to use Data-tier applications or to use the SQL Azure Migation Wizard. The latter is an open source tool that copies your database (including the data) in any direction. Data-tier applications however is a new feature in SQL Server 2008 R2 and SQL Azure that enables you to develop a database in Visual Studio like you would develop a C# applications or similar. Existing databases can easily be converted to a Data-tier applications as long as you don't use any features that are not supported (such as XML).

Deploying a Data-tier application in SQL Azure

You can't deploy it from Visual Studio. Instead build your project, connect to SQL Azure through SQL Server Management Studio, right-click on your instance and choose "Deploy Data-tier Application..." in the menu. Click Next in the wizard and browse to your .dacpac file.

Upgrading a Data-tier application in SQL Azure is a manual process. You'll need to deploy it under a new name, copy the data manually from the old database and rename it. How to rename a SQL Azure database? Issue the T-SQL statement: "ALTER DATABASE database_name MODIFY NAME = new_database_name".

To add logins and users in your Data-tier application you need to create the login with the MUST_CHANGE password option.

CREATE LOGIN [username] WITH PASSWORD='' MUST_CHANGE, CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF

You map it to a database user in the same way as "ordinary" SQL Server:

CREATE USER [username] FOR LOGIN [username] WITH DEFAULT_SCHEMA=[dbo]

Then you need to add it to one or more roles by executing this T-SQL statement within your database (make sure you don't execute it within the master database):

EXEC sp_addrolemember N'my_db_role', N'username'

Finally you need to assign a password and enable the login by executing the following T-SQL statement on the master database.

ALTER LOGIN [username] WITH PASSWORD='mypassword'
GO
ALTER LOGIN [username] ENABLE

Changing the edition of your SQL Azure database or increasing the size

You can increase or decrease the size of the database or change the edition at any time. Just issue the following T-SQL statement on the master database.

ALTER DATABASE database_name
MODIFY (MAXSIZE = x GB, EDITION = 'x')

Edition can currently be either web or business. Database size can currently be either 1, 5, 10, 20, 30, 40 or 50 GB.

Resizing or changing the edition of your database takes almost no time so you can easily start with a smaller database and increase it on demand (like you should be able to with a cloud database).

Posted: Nov 05 2010, 08:23 AM by jahlen
Filed under: ,
I will speak about SQL Azure at Azure Summit in Stockholm

Want to learn about SQL Azure? I will present the differences compared to "ordinary" SQL Server and what you as a developer or database administrator need to know to successfully move to SQL Azure. I will also give a guided tour on hour to get started with SQL Azure.

The event will be held at Microsoft's office in Stockholm on the 18th - 19th of November.

Click here to go to the event website.

StreamInsight 1.1 released

A new version of StreamInsight has just been released. The biggest news are support for the Reactive Framework in .NET 4.0, making it much easier to integrate with existing .NET applications, better performance and stability, especially for edge joins, and side-by-side installation with older versions.

Any of you who tested the StreamInsight CTPs may remember there was support for two interfaces, IObservable/IObserver. These were removed in the RTM but are now replaced by support for the .NET 4.0 IObservable and IEnumerable interfaces. These provide an alternative to develop adapters and can help simplify access to historical data.

Read more here on the StreamInsight blog:
http://blogs.msdn.com/b/streaminsight/archive/2010/10/25/releasing-streaminsight-v1-1.aspx

Posted: Oct 26 2010, 02:00 PM by jahlen
Filed under:
New StreamInsight article published - How to build a realtime Twitter adapter for StreamInsight

I've written a new StreamInsight article which explains how to build a realtime Twitter adapter for StreamInsight. Below is a screenshot of my sample application:

Check it out on the link below:
http://www.sqlservercentral.com/articles/StreamInsight/70278/

 

I've started twittering

I've decided it's finally time for me to start twittering. I'm usually not the kind of guy who lags behind when new technologies come, but I must admit I thought it a bit overkill to have more than one place (my blog) to write. Now I have given in to the pressure... J

You'll find me on Twitter under the username @johanahlen. I will keep twittering about SQL Server, StreamInsight, Azure and Cloud technology, Business Intelligence and technology in general. So you'll probably keep hearing more from me there.

Link to my Twitter profile:
http://twitter.com/johanahlen

 

Script to check the cost of your SQL Azure database objects

What is the cost of storage for your SQL Azure database objects? You can easily check that our with the following script:

SELECT
object_name(t1.object_id) 'Object'
, t2.name 'Index'
, SUM(reserved_page_count) * 8192 'bytes'
FROM sys.dm_db_partition_stats t1
LEFT JOIN sys.indexes t2 ON t1.object_id = t2.object_id AND t1.index_id = t2.index_id
GROUP BY t1.object_id, t2.name
ORDER BY SUM(reserved_page_count) DESC

Since SQL Azure storage currently costs approx $10 per gigabyte and month, you could easily see how your objects affect the price.

Posted: Sep 19 2010, 02:26 PM by jahlen
Filed under: , ,
Why skilled developers will be even more important in the future with the cloud

There was a discussion about a year ago on Swedish blogs about why developers don't adopt new methods and technologies and whether these are bad developers or not. It was also discussed if this was because of managers who don't give the developers time to learn anything new.

Of course there will always be a need for people who know old technologies (like COBOL). If I wanted to repair a 20 year old car, I'd prefer a mechanic who is an expert on 20 year old cars of my brand. So there is a point in the opinion that not everyone needs to learn the latest things.

On the other hand a lot of work is done by consultants who are given almost no time for developing their development skills. Their managers demand they do nothing that isn't billable. A recent example I saw was consultants who had spent months (of billable time) developing what was already existing functions in SQL Server. Instead of spending months of development they could have spent a few days of learning. There are probably lost of cases like this where everybody is happy (as long as the customer doesn't understand how much money they have wasted).

What amazes me most is the lack of interest some developers are showing to learn new things. As leader of the Swedish SQL Server User Group I can't help noticing who participates in our activities or signs up to our newsletter. Some people (particularly from large consultant companies) seem totally uninterested in learning new things. It can't be blamed on that their management alone - even if the management demands 99% billable time, there is always time to subscribe to an occasional newsletter. Customers who only focus on hourly price tend to get this kind of consultants and probably pay for a lot of unnecessary hours.

So why is Windows Azure (and the cloud in general) making it even more important with skilled developers? I'd say that's because of the billing models within Azure. Basically you per for the resources you consume: CPU time, storage, bandwidth, etc. So a poorly developed solution would cost much more in day-to-day fees which will be very visible to the customer.

In other words: I think that the cloud will make customers much more aware of efficiency and quality of development work than they are today. It's like getting a more specific bill where you can see what causes the costs instead of just a lump sum. The consultants who survive in the cloud will be those that understand that you need to invest in your own competency.

The blog posts from last year

Surprising parallel processing performance in .NET 4.0

C# and .NET Framework 4.0 has a good number of really good new features. Some of the best features, from a performance point of view, is the Task Parallel Library and Parallel LINQ (PLINQ). I did some performance testing recently and was surprised by the results.

Here's my test code:

// Create an array of a million random numbers
Random rand = new Random(0);
var randomNumbers = Enumerable.Range(1, 1000000).Select(i => rand.Next(1000000)).ToArray();

// Sort the numbers. Use stopwatch to measure.
var stopWatch = Stopwatch.StartNew();
var sortedNumbers = randomNumbers.OrderBy(i => i).ToArray();
stopWatch.Stop();

// Display results
Console.WriteLine("Time: {0:F6} seconds", stopWatch.Elapsed.TotalSeconds);

This takes approx 1.4 seconds to execute on my machine (a laptop with a dual-core Intel CPU).

So, could sorting be done in parallel? Does PLINQ contain a parallel sorting algorithm that would make use of both cores? How much faster would that algorithm be? Twice as fast?

To test this I changed one line:

var sortedNumbers = randomNumbers.AsParallel().OrderBy(i => i).ToArray();

Now, how fast do you think it was? On my machine it took approx 0.5 seconds. That means it not only makes use of both cores, but it must be a more clever algorithm. Very interesting for performance optimization.

On other operations such as filtering, I've not got this level of performance gain. Instead it reduced execution time by about 30%.

Even more surprising. On my example, the Sum() method is a little slower in PLINQ than sequential LINQ. So it seems you have to check every method if you want to ensure you get maximum performance.

Here's a summary of the new features in .NET Framework 4.0:

http://msdn.microsoft.com/en-us/library/ms171868%28VS.100%29.aspx

There's also a good summary of the C# 4.0 news in this blog post:

http://blogs.msdn.com/b/csharpfaq/archive/2010/04/12/get-ready-for-c-4-0.aspx

 

 

Minesweeper in T-SQL

Who said that coding T-SQL should not be fun? Minesweeper is a real old classic (that's still being shipped with Windows). Truly creative of Auke Teeninga to implement Minesweeper in T-SQL...

Keep on playing!

 

More Posts Next page »