What are Orphan Partitions?
Orphaned partitions are partition databases present on leaf nodes that the master aggregator node doesn't consider part of the cluster (they're neither master nor slave partition databases). This means the master aggregator node will drop this database if it ever gets in the way of something it wants to do.
Why is it Important to Know About Orphan Partitions?
It is possible in some crash scenarios to have orphaned partitions that are the last valid copies of a partition. In this case run ATTACH PARTITION to attach the orphan back into the cluster right away. Do not run high level commands (REBALANCE PARTITIONS, RESTORE REDUNDANCY, DETACH LEAF, etc.). High level commands may drop the orphan.
Example:
SHOW CLUSTER STATUS | Host | Port | Database | Role | State | Position | Details | +-----------+-------+----------+-----------+-------------+----------+---------+ | 127.0.0.1 | 10001 | x_test | Reference | replicating | 0:57 | | | 127.0.0.1 | 10001 | x_test_0 | Master | online | 0:10 | | | 127.0.0.1 | 10001 | x_test_1 | Slave | replicating | 0:10 | | | 127.0.0.1 | 10002 | x_test | Reference | replicating | 0:57 | | | 127.0.0.1 | 10002 | x_test_0 | Slave | replicating | 0:10 | | | 127.0.0.1 | 10002 | x_test_1 | Orphan | online | 0:10 | |
SHOW PARTITIONS +---------+-----------+-------+--------+--------+ | Ordinal | Host | Port | Role | Locked | +---------+-----------+-------+--------+--------+ | 0 | 127.0.0.1 | 10001 | Master | 0 | | 0 | 127.0.0.1 | 10002 | Slave | 0 | | 1 | 127.0.0.1 | 10001 | Slave | 0 | +---------+-----------+-------+--------+--------+
We are missing a master for partition 1, but we have an online orphan for partition 1. That is the master we need, so attach it:
ATTACH PARTITION x_test:1 on '127.0.0.1':10002
SHOW CLUSTER STATUS +-----------+-------+----------+-----------+-------------+----------+---------+ | Host | Port | Database | Role | State | Position | Details | +-----------+-------+----------+-----------+-------------+----------+---------+ | 127.0.0.1 | 10001 | x_test | Reference | replicating | 0:57 | | | 127.0.0.1 | 10001 | x_test_0 | Master | online | 0:10 | | | 127.0.0.1 | 10001 | x_test_1 | Slave | replicating | 0:10 | | | 127.0.0.1 | 10002 | x_test | Reference | replicating | 0:57 | | | 127.0.0.1 | 10002 | x_test_0 | Slave | replicating | 0:10 | | | 127.0.0.1 | 10002 | x_test_1 | Master | online | 0:10 | | +-----------+-------+----------+-----------+-------------+----------+---------+
SHOW PARTITIONS +---------+-----------+-------+--------+--------+ | Ordinal | Host | Port | Role | Locked | +---------+-----------+-------+--------+--------+ | 0 | 127.0.0.1 | 10001 | Master | 0 | | 0 | 127.0.0.1 | 10002 | Slave | 0 | | 1 | 127.0.0.1 | 10001 | Slave | 0 | | 1 | 127.0.0.1 | 10002 | Master | 0 | +---------+-----------+-------+--------+--------+
Why Do Orphan Partitions Exist?
The master aggregator avoids dropping partition databases that it doesn't know the origins of just in case this data is needed. This is a safe guard against data loss due to a bug or some other strange configuration issue that caused the master aggregators view of the world to become incorrect and the orphan contains actual data (this is very rare, but it does happen). The easiest way to create orphans is to shutdown a leaf and then have REBALANCE PARTITIONS move partitions away from the pair with the offline leaf. When the leaf comes back online it will have orphans (in this case the orphans are actually just stale data).
To remove orphans you're sure you don't need, run:
EXPLAIN CLEAR ORPHAN DATABASES;
to see which partitions will be dropped, followed by:
CLEAR ORPHAN DATABASES;
Make sure to have run SHOW PARTITIONS and SHOW CLUSTER STATUS and have verified that all partitions are accounted for before doing this. CLEAR ORPHAN DATABASES can't be undone.
How Do I Detect Orphan Partitions?
Check the output of SHOW CLUSTER STATUS. Any orphaned partitions will have the role of Orphan
Tracelog messages of the following form indicate the presence of an orphan partition on a leaf node
WARN: Partition database `partitonName_ordinal` exists on node 'host':port but `partitonName_ordinal`:ordinal does not occupy this node.
Please reach out to MemSQL Support if you need assistance with orphan partitions.