Downtime apology and explanation
June 2, 2012
We had some serious downtime today. I apologise for the extended downtime and the aggravation that it caused our customers. I would also like to give our customers an explanation as to what happened and what we’re doing to fix it.
Last night around 8 pm, a transaction entered the exchange engine which triggered an error. This error did not affect anything else on the site and operations continued as normal. However, the exchange engine continued to report this error at a pace that filled up our error log. Around 8 am this morning, this particular server ran out of disk space. This caused the exchange engine to shut down, but because it was out of disk space, it was unable to save to disk the the status of current orders.
I made the decision to keep the site offline until we were able to restore all of our customers’ orders. Everything was backed up on another database. It took around ten hours for our engineers to fix the exchange engine to their satisfaction that everything was 100% correct.
At 8 pm tonight, we were able to successfully bring the website online with all services functioning normally.
We will be adding software to alert us if the disk space is low and gracefully shut down the exchange engine if there is a danger situation. We will also improve the error logging capabilities. The software that triggered the transaction error in the first place has been fixed.
I would like to sincerely apologise for the downtime, especially on such a busy sports day. Our customers rely on us to stay available 24/7 and we failed to deliver today. It’s simply not good enough. We will fix this and we stay committed to our goal of making Smarkets the best place to bet in the world.
CEO of Smarkets