Blog

Sometimes we’ll have a few thoughts, and we’ll write them down over here.

Why, in 2016, a mature ERP like Exact should use snapshot isolation in their RDBMS

A client of ours has over the years grown into an ERP (Exact Globe) that stores all its data in an MS SQL Server. This by itself is a nice feature; the database structure is stable and documented quite decently, so that, after becoming acquainted with some of the legacy table and column names (e.g. learning that frhkrg really means ‘invoice headers’), information can often be retrieved easily enough. Indeed, this is very useful, for example when connecting the ERP (as we did) to a custom-made webshop, or when setting up communications to other systems by EDI.

However, contrary to what you’d expect with all the data stored in a big-name SQL server, the data can at times be difficult to access due to deadlocks and other locking time-outs. A lot of time has gone into timing various integration tasks (SSIS, cron jobs and the like) such that they do not interfere with each other, while most of these tasks only read from the database, with the exception of a few task-specific tables (which will rarely cause locking issues).

There are some slow functions and views in the database, but it’s running on such a powerful server (24 cores, 200GB RAM) that this really ought not to be a problem. Yet it is, and it continues to be. And the problem can hardly be solved by further optimizations or by throwing yet more iron at it. The problem could be solved by turning on snapshot isolation.

Life without snapshot isolation

Coming from a PostgreSQL background, I was quite surprised to find that snapshot isolation is not turned on by default in MS SQL Server. In Postgres it cannot be turned off (due to its multiversion concurrency control architecture). And why would you want to?

Without snapshot isolation, what you have is a situation where reads can block writes, except when the transaction isolation level is READ UNCOMMITTED, which isn’t even supported by PostgreSQL (which treats READ UNCOMMITTED as READ COMMITTED). Except for heuristic reporting, READ UNCOMMITTED is best avoided since it allows dirty reads, of uncommitted operations that may well be rolled back later. That’s why the default transaction isolation level in MSSQL, MySQL and PostgreSQL is READ COMMITTED. Sometimes, there are stricter requirements: for example, that the data used in the transaction should not change except from within the transaction. Such could be guaranteed by using the strictest transaction isolation level: SERIALIZABLE. Somewhere in between READ COMMITTED and SERIALIZABLE is REPEATABLE READ. Anyway, the take-home message is that stricter transaction isolation levels generally require more aggressive locking.

Suppose I have an EDI task that has to export shipping orders to a warehouse. In Exact, these orders can be found by querying the frhkrg table (with invoice header lines):

BEGIN TRANSACTION select_unexported_invoices;
-- SET TRANSACTION ISOLATION LEVEL READ COMMITTED; -- is implied

SELECT    faknr -- invoice number
FROM        frhkrg
WHERE    fak_soort = ‘V’
        AND
        NOT EXISTS
        (
            SELECT    *
            FROM        invoices_exported
            WHERE    invoice_number=frhkrg.faknr
        )

Besides the interesting table and column names, this select statement is straightforward enough. What caught me by surprise, as a long time PostgreSQL user, is that this transaction can be blocked by an innocuous update on the frhkrg table in a parallel transaction. Yes, with snapshot isolation off, writes can block reads, even if the the transaction does not even require repeatable reads.

This behaviour is easy to replicate:

CREATE DATABASE Ytec_Test;
 
USE Ytec_Test;
 
---
-- Create test table with test data and index
--
CREATE TABLE locking_test
(
    id INT PRIMARY KEY,
    col1 VARCHAR(32) NOT NULL,
    col2 VARCHAR(32) NOT NULL,
    col3 VARCHAR(32) NOT NULL
)
;
INSERT INTO locking_test (id, col1, col2, col3)
SELECT 1, 'Aap', 'Boom', 'A'
UNION ALL
SELECT 2, 'Noot', 'Roos', 'B'
UNION ALL
SELECT 3, 'Mies', 'Vis', 'C'
UNION ALL
SELECT 4, 'Wim', 'Vuur', 'D'
;
CREATE NONCLUSTERED INDEX bypass_lock
    ON locking_test (col1)
    INCLUDE (col2, id)
;

With the test data set-up, it’s easy to lock a read operation:

BEGIN TRANSACTION WRITE1;
UPDATE locking_test SET col1 = 'Aap-je' WHERE id=1;

As long as transaction WRITE1 is open, mosts selects will be blocked (except one, which uses the index exclusively):

BEGIN TRANSACTION READ1;
 
-- SET TRANSACTION ISOLATION LEVEL READ COMMITTED -- is implied
 
SELECT id FROM locking_test; -- WITH (READCOMMITTED) -- is implied (as also below)
-- Time-out
 
SELECT id, col2 FROM locking_test WHERE col3='D'; -- with a non-indexed column in the predicate
-- Time-out
 
SELECT id, col3 FROM locking_test WHERE col1='Aap'; -- with a non-indexed column in the select list
-- Time-out
 
SELECT id,col2 FROM locking_test WHERE col1='Noot'; -- with only indexed columns
-- id    col2
-- 2    Roos

One trick that can be glimpsed from this example is that you can use indices to bypass locks, but only to the extent that you’re not trying to select a row that is currently being locked. The following doesn’t work when WRITE1 is open:

BEGIN TRANSACTION READ2;
SELECT id FROM locking_test WHERE col1='Aap'; -- doesn't work

Another possibility to partially work around aggressive locking is to use the table-hint READPAST:

SELECT id FROM locking_test WITH (READPAST) ORDER BY id;
-- id
-- 2
-- 3
-- 4

But, as you can see, this is limited in its applicability, since the locked row won’t be included in the results, which may or may not suit you.

Life with snapshot isolation

It will make your life as a developer a lot easier to simply turn on snapshot isolation. In PostgreSQL, you don’t have to, because as I mentioned earlier: it won’t allow you to turn it off. But, in MSSQL, you really do have to. Yes, you do.

ALTER DATABASE YTEC_Test SET READ_COMMITTED_SNAPSHOT ON
ALTER DATABASE YTEC_Test SET ALLOW_SNAPSHOT_ISOLATION ON

Welcome, MSSQL users, in the wonderful world of snapshot isolation! Now, let’s open WRITE1 again and retry READ1 and we’ll see that something wonderful has happened: we were able to read the data, including the row that was at that moment being updated.

That’s why turning on snapshot isolation can make a huge difference. If your legacy applications depend on MSSQL, it’s definitely worth testing them with snapshot isolation turned on. Snapshot isolation was first introduced in Microsoft SQL Server in 2005. That’s a long time for such an essential feature to go unused! I hope that Exact Software is listening and will announce official support for turning on snapshot isolation in Exact Globe’s database; then, eventually, they could even stop performing all their read operations at transaction isolation level READ UNCOMMITTED.

Finally, a quick word of advice: if you’re at the start of a new project and not yet stuck with a particular database legacy, I would recommend looking into PostgreSQL first. Changes are, you won’t ever look back. In a future article, I may go into the reasons why.

[The traffic light photograph is courtesy by Flickr user YunHo LEE, who released it into the public domain.]