Johan Åhlén

Johan Åhlén's blog about life, universe and everything.

January 2010 - Posts

I will speak about StreamInsight at TechDays 2010 in Sweden

StreamInsight is one of the main news of SQL Server 2008 R2. It's a platform for Complex Event Processing (CEP) that enables processing of high speed event streams using familiar tools such as Visual Studio, C# and LINQ. CEP is a technology that is increasingly used for algorithmic trading, targeted ads at websites and real-time Business Intelligence. In this session, you will get a 60 minutes introduction to how StreamInsight works and a demo of how to build an application that handles data in near real-time from sources such as Twitter, Facebook and financial markets. 

Meet me at TechDays during 23-24 of March in Örebro (Sweden)!

Algorithmic Trading using Complex Event Processing (CEP) and Data Mining

Algorithmic Trading is automated trading done by computer programs (algorithms) that react to patterns in financial market data or financial news. As of 2009, high frequency trading firms accounted for 73% of all US equity trading volume.

Several different strategies are used for algorithmic trading. For example index arbitrages exploit differences between pricing of indexes and the stocks they represent. Market making is another strategy that aims at earning money on the spread between bid and ask price. More complicated strategies use data mining and pattern recognition.

Personally I believe there's much truth in the saying that the stock market is mostly psychology. How do you believe the following influences the stock market? 

  • Dividends and issues of shares. Overreactions.
  • Buy- or sell recommendations from analysts.
  • Publicity in newspapers.
  • Weather.
  • Season. Report periods.
  • Political events.

Getting Started

What do you need for your do-it-yourself Algorithmic Trading system?

  • A high quality data source of market data.
  • Reliable sources for other data.
  • An automated financial instrument order handling system.
  • A CEP (Complex Event Processing) engine.
  • A data mining engine.

Financial data feeds are available as standard services from several financial companies. There is even a standard protocol for the information, FIX (Financial Information eXchange). Also large financial news bureaus can supply their news in a format that is adapted for automatic analysis.

Financial brokers exists that can handle your orders automatically at decent commissions (of course depending on your trading volume). You should be able to find them through your internet stock broker, which may supply this service themself or have a partner that does.

Complex Event Processing and Data Mining are supported by SQL Server 2008 R2 through StreamInsight and Analysis Services. You can learn more about StreamInsight in my articles on SQLServerCentral.

Evaluating your algorithms

Some factors to consider:

  • "Edge" (Return On Investment in the long run)
  • Opportunity (how often will there be any action)
  • Accuracy (how often will there be a win)

Personally I'd prefer an algorithm that gives a stable income compared to a high risk algorithm.

Risks

Algorithmic Trading, like any other investment activity, requires you to be cautious. A friend told me about his neighbour that earned millions one year and lost them (and his house) the next. Like in poker you have to decide on a maximum loss level where you are ready to take the pain and leave.

On the other hand, Complex Event Processing is also used by trading firms to reduce their risk. It can help you monitor your risk exposure in real time.

Also, very important... just that you have found a pattern that worked historically doesn't imply that it will work in the future. Have you heard about "The Bible Code" (or Torah Code) which describes encrypted messages in the bible? Only... you could find encrypted messages in any book if you search long enough. Remember, that is the weakness of Data Mining.

My recommendation is to keep to a test environment until you are sure about your algorithms. Also it is not a bad idea to take a course in statistics to learn how to make statistical significance tests.

 

Exploring Codename "Dallas" with PowerPivot

A couple of weeks ago, Microsoft opened up their Codename "Dallas" website. The service is by invitation only, but there is a mail address on their site where you can ask for an invitation code.

What is Codename "Dallas"?

- It is a marketplace for premium information, both current and historical, where you can search and access it in a single location and using a standardized API (REST and ATOM 1.0) and a simple billing model. Great for Business Intelligence applications where you easily can combine it with your own data. The benefit for the consumer is that you can very easily buy access to the information you want. For suppliers, it provides an easy way to market and sell your information services.

Let me show an example how you can get the information you want into PowerPivot.

First there is the catalog, where you can browse and subscribe to the dataset you are interested in.

Then you can manage your subscriptions and click on a link to explore the contents of them. Also here is a link to vote for contents you would wish to have (a funny coincidence that one of the top content requests now is for news feeds - and that is exactly what I have been developing on the Windows Azure platform lately).

Below is a screenshot of the content explorer. I've picked the United Nations WHO (World Health Organization) Database as an example. You specify which variable you are interested in, for example Life Expectancy. Then you select year and can also filter on nations. A preview is available.

If you are interested in using this data in PowerPivot, you can click Analyze. For me it works only if I first start PowerPivot and then click the Analyze button. Then a dialog similar to the one below should be shown. Click Open.

PowerPivot will then display the Table Import Wizard. Give a good name to your connection and click Next.

During the next step, you can limit the contents of the data feed. Or just click Finish to include all data.

Finally you should see a confirmation dialog that the rows have been loaded into memory. However I never managed to import more than 100 rows at a time from the WHO Database. Probably that is some limitation of the trial version or a limitation on their side. To get more rows, I needed to repeat the process and add new data feeds.

Finally, you can play around with the data as usual in PowerPivot.

Another option than using PowerPivot is the consume data from your C# applications. For that purpose, codename "Dallas" can autogenerate C# classes that fetch the data for you.

There's a link on the content explorer to create the C# proxy classes. Click there and then choose to save the file.

Below is an example autogenerated C# class for reading the WHO data.

The number of suppliers is currently very limited. For the business minded it should be a good opportunity to sell your high quality information services.

Give it a try on http://www.microsoft.com/windowsazure/dallas/!