Covering PostgreSQL Indices

Often when you index tables, you’ll start to notice that indices tend to cover the same field and over again.

You start by adding this index:

CREATE INDEX user_username_indx ON user(username)

Weeks go by and you feel like adding this:

CREATE INDEX user_username_created_at_indx ON user(username, created_at)

You can do that; however, the original index already “covers” this case! You’re adding more indices that have to be maintained for no reason.

An index can likely cover a query if the field placement in the index matches what’s in the query, from left to right.

Another way of looking at it, is that a index can likely cover all of it’s possible field prefixes.

With n fields indexed, there are n possible query field prefix combinations that can be covered.

For instance, the last index above on user(username, created_at) will likely be used for query’s searching on username only, or username and created_at.

Why does this matter?

Due to PostgreSQL’s implementation, an update on a record executes a delete and an insert operation instead of an in-place update.

Indices all store the physical disc location (the page and row number) of the record (accessible through the ctid field on a record).

So on each update to a record, EVEN IF the query doesn’t change the field referenced by any index, all indices must also be updated.

This gives PostgreSQL an advantage when it’s comes to reading from an index since it will find the record’s exact location ctid after traversing the index.

On the other hand, MySQL takes the opposite tradeoff to prioritize index writes over reads: it stores the primary key as the lookup value for indices.

In this way, MySQL indices use a level of indirection; speeding up writes at the read cost of having to traverse both the secondary and primary indices to find the physical location of a record (if a secondary index is used).