Today I gave a talk at ElixirConf about this topic. Below is similar content just laid out differently. Once the video has been made available I will add it here.

UPDATE

A database is something that a lot of engineers use on a day to day basis, but may not truly understand how it works or what it can do for them or the application that they are building. In this post I am hoping to be able to give a basic understanding of how someone can go about getting a better understanding of what it is Postgres will do with any given query and how we can go about looking for optimisations.

Before we get started I wanted to introduce a bit of terminology

Heap - The main storage area used by Postgres, this is where all table data is stored.
Page - The smallest storage unit with Postgres. The terms “page” and “block” can be used interchangeably.

For this article, lets use the following tables.

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  first_name TEXT,
  last_name TEXT
);

CREATE TABLE businesses (
  id BIGSERIAL PRIMARY KEY,
  legal_name TEXT NOT NULL
);

CREATE TABLE businesses_users (
  business_id BIGINT NOT NULL REFERENCES businesses(id),
  user_id BIGINT NOT NULL REFERENCES users(id)
);

This gives us three tables.

users table with first and last name fields
businesses table with a legal name; and
businesses_users table that joins the two.

We have three main queries that get run in this system. The first is a simple get based on the users id in order to keep track of the current user within our application

SELECT id, first_name, last_name
FROM users
WHERE id = 10;

The second is to find all of the businesses that a given user belongs to

SELECT b.id, b.legal_name
FROM businesses b
INNER JOIN businesses_users bu ON b.id = bu.business_id
WHERE bu.id = 5;

And the last is the reverse, to search for all users that a given business employs

SELECT u.id, u.first_name, u.last_name
FROM users u
INNER JOIN businesses_users bu ON u.id = bu.user_id
WHERE bu.id = 12;

With these three queries in place, we generate some seed data and everything works as we expected. Performance appears to be good for each query and we decide to ship our MVP.

After on-boarding a handful of businesses we are receiving some complaints that our application is getting slower and slower. If we are lucky someone actually tells us which part(s) of the system are getting slower. But if we lack that information (or we want to be more proactive in our approach), what is it we can do in order to find out where we can look? Sometimes it may be application code and we can optimise our own code and in order to get the performance we are looking for. Other times we will need to search our database or other external service we may be using. Application monitoring can be really helpful here to know where to look.

Once we have settled on where to look and we find out our database query is taking up so much time for a given request, what is it we can do to look for performance optimisations? One of the best things we can do is an extension to Postgres called pg_stat_statements.

pg_stat_statements

This is my go-to when trying to find query optimisations for Postgres. To quote its documentation,

The pg_stat_statements module provides a means for tracking planning and execution statistics of all SQL statements executed by a server.

What exactly does this mean? Well, lets take a look. If you do not already have this enabled, you can simply run

CREATE EXTENSION pg_stat_statements;

in order to start collecting statistics on a per query basis. Now that we have this enabled, lets take a look at a quick query in order to see what it is this extension gives us.

SELECT query,
       round(total_exec_time::numeric, 2) AS total_time,
       calls,
       round(mean_exec_time::numeric, 2) as mean,
       round((100 * total_exec_time / sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_overall
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This query is going to return us

Parameterized query being run
Total amount of time across all runs
Number of times this query has run
The mean execution time
The percentage of time compared to every other query in the system

Lets take a quick look at what the output looks like

         query       | total_time |  calls   |   mean   | percentage_overall
---------------------+------------+----------+----------+--------------------
 SELECT u0."id", u0. |  534841.30 | 54664493 |     0.01 |              36.92
 SELECT b0."id", b0. |  473400.18 |      571 |   829.07 |              32.68
 SELECT u0."id", u0. |  369813.29 |      570 |   648.80 |              25.53

I used a substring of the query in order to fit it nicely here, but you can see some important information here. The queries listed from top to bottom area

Select a user given an id
Select all businesses given a user id; and
Select all users given a business id

Intuitively we may want to take a look at the first query in the result set here. But if we look a bit closer, while this query is taking up almost 37% of our database execution time, it only takes 0.01ms to actually run on average. If there is anything that can be done about this query, it is probably outside of a database optimisation. But if we look at the second query, it has taken over 32% of our execution time AND it takes over 829ms to run on average. This is probably a good candidate for optimisation.

So what can we do in order to actually try and optimise a given query?

Explain

Using EXPLAIN we can take a peek inside of Postgres in order to see what it will do with any given query. Using explain alone will not run the query, but it will tell you what Postgres expects to do with the query. This is called a query plan. This alone may be able to give us an idea of what it is we need to do. Adding the ANALYZE option will actually execute the query as well and give additional information based on execution. Lets take a look at an example output from our poorly behaving query.

EXPLAIN ANALYZE
SELECT
    b0."id", b0."legal_name", b1."user_id"::bigint
FROM "businesses" AS b0
INNER JOIN "businesses_users" AS b1 ON b0."id" = b1."business_id"
WHERE (b1."user_id" = ANY(ARRAY[5])) ORDER BY b1."user_id"::bigint;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=57269.31..57269.32 rows=1 width=34) (actual time=1573.737..1574.719 rows=0 loops=1)
   Sort Key: b1.user_id
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=1000.28..57269.30 rows=1 width=34) (actual time=1573.602..1574.585 rows=0 loops=1)
         ->  Gather  (cost=1000.00..57261.00 rows=1 width=16) (actual time=1573.601..1574.583 rows=0 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on businesses_users b1  (cost=0.00..56260.90 rows=1 width=16) (actual time=1567.966..1567.967 rows=0 loops=3)
                     Filter: (user_id = ANY ('{5}'::integer[]))
                     Rows Removed by Filter: 1695495
         ->  Index Scan using businesses_pkey on businesses b0  (cost=0.29..8.30 rows=1 width=26) (never executed)
               Index Cond: (id = b1.business_id)
 Planning Time: 20.093 ms
 Execution Time: 1574.829 ms

In this case, despite our mean being only 829ms, it actually took the query 1574ms to execute. There is a lot happening here if you are unfamiliar with query plans, so lets try to break this down a little bit.

->  Index Scan using businesses_pkey on businesses b0  (cost=0.29..8.30 rows=1 width=26) (never executed)
      Index Cond: (id = b1.business_id)

It looks like our businessess table is doing something called an Index Scan. This is actually going to give us some of the best performance we can get in a given query. This means that we have an index being used to help make this part of the query performant. In this case it is the businesses_pkey index which is auto generated for us when we are using a primary key in Postgres. Things look good here, lets see what we can find about other tables in our query.

->  Parallel Seq Scan on businesses_users b1  (cost=0.00..56260.90 rows=1 width=16) (actual time=1567.966..1567.967 rows=0 loops=3)
      Filter: (user_id = ANY ('{5}'::integer[]))
      Rows Removed by Filter: 1695495

The first thing we notice is that we are doing something called a Parallel Seq(uential) Scan on the businesses_users table. If we recall, this the join table between our two main tables. In general, a Sequential Scan is considered really poor performance. There are two big caveats here which I will go over a little later.

If we look closely it is saying that we have removed 1,695,495 rows using the filter of user_id = some_id. Because this is a parallel scan, Postgres actually started multiple workers to go through the table in order to find the data that is needed. In this case, you can see that it planned and launched 2 workers. This is in addition to the main thread for a total of 3 threads scanning the heap. So we actually need the take the number from loops=3 and multiply our removed rows by that in order to find out how many rows were actually removed. In this case, we actually filtered out 5,086,485 rows. Surely there is something we can do to optimise this part of our query.

The answer is to add an index to our businesses_users table that covers our filter.

CREATE INDEX CONCURRENTLY businesses_users_user_id_index ON businesses_users (user_id);

After this, lets take another look at the performance of our query.

EXPLAIN ANALYZE
SELECT
    b0."id", b0."legal_name", b1."user_id"::bigint
FROM "businesses" AS b0
INNER JOIN "businesses_users" AS b1 ON b0."id" = b1."business_id"
WHERE (b1."user_id" = ANY(ARRAY[5])) ORDER BY b1."user_id"::bigint;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.72..16.75 rows=1 width=34) (actual time=0.021..0.023 rows=0 loops=1)
   ->  Index Scan using businesses_users_user_id_index on businesses_users b1  (cost=0.43..8.45 rows=1 width=16) (actual time=0.020..0.020 rows=0 loops=1)
         Index Cond: (user_id = ANY ('{5}'::integer[]))
   ->  Index Scan using businesses_pkey on businesses b0  (cost=0.29..8.30 rows=1 width=26) (never executed)
         Index Cond: (id = b1.business_id)
 Planning Time: 0.566 ms
 Execution Time: 0.074 ms

we moved from averaging 829ms to only 0.07ms execution time. This is multiple orders of magnitude faster than what we originally had, all because we forgot a simple index on our database.

Hopefully unsurprising, but the other query on that list was our other join query where we are going from a business to a user. It is going to suffer from the same problem, so we can just go and put an index on that.

So now we let our application run for a while with our new indexes and check performance.

Lets try running our query again to find out information about our queries.

         query       | total_time |  calls   |   mean   | percentage_overall
---------------------+------------+----------+----------+--------------------
 SELECT u0."id", u0. | 1593081.64 |       93 | 17129.91 |              49.43
 SELECT u0."id", u0. |  541387.17 |     8594 |    63.00 |              16.80
 SELECT u0."id", u0. |  535028.43 | 54672523 |     0.01 |              16.60
 SELECT b0."id", b0. |  473507.20 |     8601 |    55.05 |              14.69

It can be hard to see because of the substring, but our join queries are now numbers 2 and 4 on this list. As we can see, the mean run time has dropped significantly. But something else has managed to bubble to the top. Lets try to investigate.

EXPLAIN ANALYZE
SELECT u0."id", u0."first_name", u0."last_name"
FROM "users" AS u0
WHERE ((u0."first_name" = 'John') AND (u0."last_name" = 'Smith'));
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1283292.11 rows=69 width=21) (actual time=228.540..2310.850 rows=79 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on users u0  (cost=0.00..1282285.21 rows=29 width=21) (actual time=163.780..2306.021 rows=26 loops=3)
         Filter: ((first_name = 'John'::text) AND (last_name = 'Smith'::text))
         Rows Removed by Filter: 33333307
 Planning Time: 0.963 ms
 Execution Time: 2310.907 ms

This is a simple query that looks for someone based solely on their first and last names. It looks like some new product feature was requested and we missed creating an index for it. But this is something that happens during development, we create and modify application features. But when we do this we also need to remember to maintain our database so that we are getting proper performance from it. Lets try to break down the above query plan to better understand what is happening.

->  Parallel Seq Scan on users u0  (cost=0.00..1282285.21 rows=29 width=21) (actual time=163.780..2306.021 rows=26 loops=3)
      Filter: ((first_name = 'John'::text) AND (last_name = 'Smith'::text))
      Rows Removed by Filter: 33333307

Once again we are doing a Parallel Seq(uential) Scan on a table. It looks like 3 workers were run to perform the query and each worker remove 33,333,307. This leads to a total of 99,999,921 rows being retrieved and discarded. No wonder this query is so slow. This can again be fixed by adding an index to the query, but before doing that, I want to show you something.

CREATE INDEX CONCURRENTLY users_first_name_index ON users (first_name);
CREATE INDEX CONCURRENTLY users_last_name_index ON users (last_name);

This is something that I see frequently being done when needing to search by multiple columns. People will create one index per field they want to search by. But this may lead to sub optimal performance. Lets create these two indexes and see what happens when we try another EXPLAIN ANALYZE statement.

EXPLAIN ANALYZE
SELECT u0."id", u0."first_name", u0."last_name"
FROM "users" AS u0
WHERE ((u0."first_name" = 'John') AND (u0."last_name" = 'Smith'));
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on users u0  (cost=2653.35..2928.26 rows=69 width=21) (actual time=44.302..59.252 rows=79 loops=1)
   Recheck Cond: ((first_name = 'John'::text) AND (last_name = 'Smith'::text))
   Rows Removed by Index Recheck: 6781
   Heap Blocks: exact=6685
   ->  BitmapAnd  (cost=2653.35..2653.35 rows=69 width=0) (actual time=42.527..42.528 rows=0 loops=1)
         ->  Bitmap Index Scan on users_first_name_index  (cost=0.00..364.88 rows=33109 width=0) (actual time=9.535..9.535 rows=33339 loops=1)
               Index Cond: (first_name = 'John'::text)
         ->  Bitmap Index Scan on users_last_name_index  (cost=0.00..2288.18 rows=209548 width=0) (actual time=31.141..31.141 rows=210705 loops=1)
               Index Cond: (last_name = 'Smith'::text)
 Planning Time: 0.367 ms
 Execution Time: 59.284 ms

Creating these indexes took us from 2310ms down to only 59ms. Surely this is a great success for this query, right? Well, lets take a quick look at the plan in order to see what it is doing.

->  BitmapAnd  (cost=2653.35..2653.35 rows=69 width=0) (actual time=42.527..42.528 rows=0 loops=1)
      ->  Bitmap Index Scan on users_first_name_index  (cost=0.00..364.88 rows=33109 width=0) (actual time=9.535..9.535 rows=33339 loops=1)
            Index Cond: (first_name = 'John'::text)
      ->  Bitmap Index Scan on users_last_name_index  (cost=0.00..2288.18 rows=209548 width=0) (actual time=31.141..31.141 rows=210705 loops=1)
            Index Cond: (last_name = 'Smith'::text)

There are a few new things happening in this query. Lets go over them one at a time.

Bitmap Index Scan means that the query is using an index to find information. However, because of the way indexes work, it is only a partial search. In this case there are two of these scans going on, one for each index we created. So what is actually happening here is each index is going through its data and creating a bitmap which is essentially a list of 1s and 0s. This will look like [1,0,0,0,1,0,0] where a 1 represents a match that was found and a 0 represents a non-match. Each element in the list represents a page in the heap. Because there are two of these indexes, two of these bitmaps will be created.
BitmapAnd will take the two bitmaps created by those index scans and compare two of them using bitwise AND to find out which pages Postgres should actually be checking for values. At the end of this operation, we will have another bitmap that looks like [1,0,0,0,0,0,0] telling Postgres where to look for the data in its heap.

And finally,

Bitmap Heap Scan on users u0  (cost=2653.35..2928.26 rows=69 width=21) (actual time=44.302..59.252 rows=79 loops=1)

The Bitmap Heap Scan Will go through the heap of the users table using the final bitmap output by the BitmapAnd operation in order to find out which pages should be looked at. This is effectively a Sequential Scan on the pages returned by the two index scans. It is not as expensive as a full Sequential Scan, but we can definitely do better.

If we had created the following multicolumn index instead, we would get much better performance.

CREATE INDEX CONCURRENTLY users_last_name_first_name_index ON users (last_name, first_name);

Lets see what happens now.

EXPLAIN ANALYZE
SELECT u0."id", u0."first_name", u0."last_name"
FROM "users" AS u0
WHERE ((u0."first_name" = 'John') AND (u0."last_name" = 'Smith'));
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using users_last_name_first_name_index on users u0  (cost=0.57..281.95 rows=69 width=21) (actual time=0.015..0.053 rows=79 loops=1)
   Index Cond: ((last_name = 'Smith'::text) AND (first_name = 'John'::text))
 Planning Time: 0.055 ms
 Execution Time: 0.064 ms

We start doing an Index Scan, which again is considered some of the best performance you can get for a given query. Note that we went from 1574.829ms without an index down to 59ms when we were using multiple indexes down to 0.06ms when using a multicolumn index. This again becomes multiple orders of magnitude faster by having optimised indexes for a given query. It is much faster and more efficient to search a single index instead of having to search multiple.

We created this index in order to search by both the first_name and last_name fields. But because of the order of our index being (last_name, first_name), we are actually able to also query efficiently by JUST the last_name column. Remember that you are allowed to take advantage of the first part(s) of an index without needing to use the entire index. So if we had a 3 column index on (last_name, first_name, email), we would have 3 potential queries that this index could serve.

Searching on JUST last_name
Searching on BOTH last_name and first_name
Searching on ALL last_name, first_name and email

If you are attempting to search on something else with only that 3 column index, you will no longer be able to take advantage of it.

I should also note that there is a performance impact on all write operations for each index we have on a given table. So not only is this faster when we read, it can become faster to write if we have proper indexes on our tables as well.

Sequential Scans

I mentioned earlier that sequential scans generally give poor performance, but with two caveats.

If your table has a small amount of data in it, it will do a sequential scan. For example, if you had a table holding the Canadian provinces and territories, that would only be 13 rows. Sequential IO will be faster than looking up an index and doing random IO.
If your query is expected to grab a large percentage of the rows in your table. I am unsure of the exact percentage, but say roughly 90%+. This again will be faster to do sequential IO than it would be to do random IO. This will likely happen during development.

With that, do not be afraid of sequential scans. They are not necessarily bad. As long as you have an understanding of what you are doing, it is on purpose and you are not doing all the time, index scans are just fine.

You can use the following query to help look for sequential scans within your database.

SELECT schemaname, relname, seq_scan, seq_tup_read,
       idx_scan, seq_tup_read / seq_scan AS avg
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;

This will help find sequential scans happening within your database. It will give you the following information.

The name of the schema
The table name
The number of times a sequential scan happened on the table
The total number of rows returned by a sequential scan
The number of times we use an index for that table
The average number of rows returned by a sequential scan

an example output would be

 schemaname |     relname      | seq_scan | seq_tup_read | idx_scan |   avg    
------------+------------------+----------+--------------+----------+----------
 public     | users            |      255 |  10981217263 | 59028398 | 43063597
 public     | businesses_users |     1292 |   3046804515 |    41923 |  2358207
 public     | businesses       |        1 |        10000 |    26348 |    10000

This can be a really good way to keep an eye out on sequential scans happening in your system and potentially another way of finding missing indexes in your system on a table by table basis. You will want to keep an eye on the seq_scan column in relation to the avg column. If you are returning a large number of rows from a sequential scan frequently, you probably have a problem.

Conclusion

When we are able to properly take advantage of our database, we may be able to hold off on increasing the size of our database or needing to setup that cache because our queries will be that much faster.

I hope you have learned something about how Postgres works and are able to apply this knowledge to your own workloads!

Introduction To Postgres Query Debugging

pg_stat_statements

Explain

Sequential Scans

Conclusion