Row locking during Query execution
MemSQL transactions use standard 2-phase locking for the most part (https://en.wikipedia.org/wiki/Two-phase_locking). Rowlocks are acquired as rows are written to and are held until the transaction that acquired them commits rollsback. This is why leaving a multi-statement transaction open that does write operations is not a good idea (in any database).
Consider the following example table:
CREATE TABLE tom (c1 int primary key, c2 int, c3 int, key(c2))
There are two different ways queries acquire rowlocks in MemSQL:
1) The query is scanning a secondary key to run the write operation. For example the query
DELETE FROM tom WHERE c2 = 1 will seek and scan the secondary index on c2 to run the DELETE. When the query finds a row with c2=1 it will seek into the primary key to acquire the rowlock before marking the row as deleted.
2) The query is scanning the primary key to run the write operation. For example the query
DELETE FROM tom WHERE c1 =1 will seek and scan the primary key on c1 to run the DELETE. Since the query is already scanning the primary key the index scan itself will acquire rowlocks to avoid a pointless seek into the primary key (as is done when the DELETE is scanning the secondary key). This makes writes that scan the primary key more faster then those that scan the secondary key (usually around 20% faster depending on the query in question).
When scanning the primary key the lock will be acquired before running any filters in the WHERE since the index scan operation itself can't run those filters (the scan is inside the storage layer, the filters are run inside of the generate code as part of the query execution layer). This means we can lock some rows we don't actually want to delete. For example, the query
DELETE FROM tom WHERE c1=1 AND c3=1. We will lock rows where c3 is not 1. To combat this, MemSQL 5 and below will release the row lock if the filter doesn't match (c3 is not 1 in our example). This is often too late though and applications are getting extra deadlocks when running concurrent delete queries that delete different sets of rows (but with primary key scans that scan over the same sets of rows). For example,
DELETE FROM tom WHERE c1=1 AND c3=1, and
DELETE FROM tom WHERE c1=1 AND c3=2, will never deadlock if c1 is not the primary key (since we end up in case 1), but can deadlock due the early locking in primary key scans.
Extra Deadlocks in MemSQL Due to Sharding
In a single box databases concurrent write operations on the same table that scan the same index in the same direction (forward or reverse) can't deadlock. This is because the writes will scan the rows in the same order so they can't hit the classic "unordered" lock acquisition deadlock (thread 1 gets a lock on row1, and is waiting for a lock on row2. thread 2 has a lock on row2 and is waiting for a lock on row1. If both threads are scanning the same index they will see the rows in the same order, so this deadlocking lock ordering is impossible). In MemSQL, the index is partitioned up amongst shards that are spread across a cluster of leaf nodes. Now we can still get a deadlock even if threads are scanning the index in the same order on each partition (Two DELETE that run at the same time race to start running on each partition. DELETE 1 can run first on partition 0 and DELETE 2 can run fist on partition 1, now they can hit the unordered lock acquisition deadlock above). Single partition DELETE operations don't have this issue (shard key match in the WHERE clause).
If a workload is running into this issue there are a few things you can try. The best thing is to have the application retry the operation, but if the deadlocks are too frequent then:
- If you hit the problem with multi-inserts, then decrease the number of rows inserted per multi-insert.
- If you are using a multi-statement transaction and don't need to use one for your use case, then avoid them. (i.e.,
BEGIN; DELETE ...; LOAD ...; COMMIT;)
- You can use the "IGNORE" option for INSERT and LOAD DATA commands to get the same behavior as 4) (no cross node transactions) without the system variable. Note that IGNORE will also ignore any errors the command hits (duplicate key, etc.) and you may not want this behavior.
- The system variable
multistatement_transactionswill turn off cross-node transactions for write queries. This means write queries that touch multiple nodes will commit locally when the processing finishes on the node. For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction. This session variable removes entirely the extra deadlock issues related to sharding (your back to the single-box sources of deadlocks).
Row locking for Columnstore Tables
Column store tables don't do row level locking. We take locks whenever we make a change to the metadata for a blob file (reminder: There is one blob file per column that stores 100K rows of compressed data). We never update the file contents themselves (new rows always go into new files), but we do update the bitmap metadata that tracks which rows in the file are deleted. This means DELETEs are really locking about 100K rows at once in a columnstore table. Columnstore tables right now (as of MemSQL 5) don't allow concurrent UPDATE\DELETE queries (there is a table level lock) so this coarse grained row locking doesn't really cause problems.