Always add Unique sorting in Database Queries
This one has bit me 2-3 times. I had added ordering on some list item in my application, but (sometimes!) my customers complained about the incorrect ordering. When I tried to reproduce it (by adding similar data on my testing environment), the ordering was correct and I was not able to reproduce it!
The Problem
Have you faced (or are facing) something similar to this?
- You have paginated the data but sometimes, an item appears on multiple pages? But when you search by it’s ID, it appears only once!
- You have paginated the data but sometimes, an item DOES NOT appear on any page? But when you search by it’s ID, it does appear!
- Ordering of a simple 4-5 items is sometimes incorrect? And you are not able to reproduce it!
Then you should check the ordering (ORDER BY column
) statements in your application. Is this ordering unique or there can be items
in the list that has same value for this selected ordering column (e.g. invoice due date, start date of tours)? If the ordering is NOT unique,
then you might have an issue of non-stable sorting.
To understand the problem, let’s first learn about the sorting and it’s categories.
Sorting
Sorting is a type of ordering in which the items are ordered based on a criteria or comparision and going from left-to-right, the criteria either increases (ascending order) or decreases (descending order) or stays same(!!).
Let’s understand this with an example. We have 4 people (P) standing in a queue, having some score (S) represented in the the diagram below. We want to order these people in the increasing order of their score (S).
COUNTER: P1 P2 P3 P4
S=20 S=10 S=30 S=20
# P1 and P4 has the same Score
# Sorted (1)
COUNTER: P2 P1 >P4< P3
S=10 S=20 S=20 S=30
# Sorted (2)
COUNTER: P2 P4 >P1< P3
S=10 S=20 S=20 S=30
So we can order them in two ways and both are correct, as per our requirements of sorting by score! Which one should we choose?
Based on how the items with the same value of criteria/comparision are ordered, the sorting can be categories in following two categories.
1. Stable Sort
After ordering, if the items with the same value of criteria/comparision, preserve the order of ordering from the original ordering then it is called stable sorting. In our example above, if our sorting process always results in the first sorting (1 => P2, P1, P4, P3), then it is stable as we preserve the order of P1 and P4 (P1 comes before P4 in the original ordering).
Semantically, the stable sort means that the order of the sorted items will be the same if we try to run the sorting process multiple times. But, to practically ensure that the order remains stable, we rely on the initial ordering of items.
2. Non-Stable Sort
If the items with the same value of criteria/comparision, may or may not preserve the order from original ordering, then the sorting is NOT stable. In our example above, if we have a non-stable sort, then the sorting process will sometimes result in the first sorting (1) and other times in second sorting(2).
If our ordering is not stable and we were to request for first 2 people from this list, we can either get [P2, P1] or [P2, P4]. Suppose we get [P2, P1] for first 2 people. Now we further request for next 2 people, we can either get [P4, P3] or [P1, P3]. If we get [P1, P3], then we get P1 in the first result as well as in the second result set! This is one of the reason we get duplicate items on pagination.
SELCT * from `LIST` order by `SCORE` LIMIT 2; // [P2, P1]
SELCT * from `LIST` order by `SCORE` LIMIT 2 OFFSET 2; // [P1, P3]
Another thing you might have noticed is that we NEVER got P4!!. This is one of the reasons we DON’T get some items on any page in non-stable sorting.
Solution: Unique Sorting
The question of stable and non-stable sorting comes only when we have items with the same criteria/comparision value. Can we avoid it?
If our criteria/comparision (e.g. order by due date of invoices) returns same value for multiple items, then we can further decide on how we want to order those (duplicate) items. In our example above, if two person have the same score, then we may choose to order them in the way they were standing earlier. Or we may choose to reverse the order, choosing P4 before P1. In any case, we can arrive at a unique (stable) ordering.
Similar to this example, we can implement a strategy for unique sorting for non-stable sorting and ensure a stable order of items if they have the same field value.
Practical on Invoices Dataset
Suppose we have an INVOICES
dataset with each item having ID
, DUE_DATE
, AMOUNT
fields. We want to show them in the ascending
order of DUE_DATE
. We can write the query something like this..
SELECT * FROM `INVOICES`
ORDER BY
`DUE_DATE` ASC
LIMIT 30;
As you might have guessed, there can be multiple invoices with similar DUE_DATE. What would be their order? Some database engines might fallback to primary key (ID) to order those similar DUE_DATE items but there is no guarantee. In our case, let say, we want to order (descending) them by the AMOUNT incase they have same DUE_DATE.
SELECT * FROM `INVOICES`
ORDER BY
`DUE_DATE` ASC,
`AMOUNT` DESC
LIMIT 30;
In our case, there can be multiple invoices with same DUE_DATE and same AMOUNT!. We can finally add the ID
(unique) to sort those
items as well. This will ensure that the sorting is unique (stable).
SELECT * FROM `INVOICES`
ORDER BY
`DUE_DATE` ASC,
`AMOUNT` DESC,
`ID` ASC
LIMIT 30;
As long as we have a unique sortable column (ID) for our dataset, we can get a unique/stable ordering of the items for our sorting preferences.
Next Steps
Other then DB queries, you can also encounter the same problem if you are using a sorting algorithm (hand written or language implemented) which is NOT stable. You must always review the sorting process and read the specs for respective languages. Furthermore, you can have an unique sortable identifier for every item in the list and fallback to the order of it to ensure a stable sorting.