Don’t Forget The Elephant: A PostgreSQL Tutorial

There’s an elephant in the room and it’s PostgreSQL, the world’s most advanced open source database! PostgreSQL is all too often overlooked by the world and it’s dog. However PostgreSQL has been gaining popularity over recent years and now runs some of the world’s largest businesses.

PostgreSQL

PostgreSQL is a relational database. There I said it. While NoSQL is the current buzz word, relational databases are not going away anytime soon, if ever. That being said PostgreSQL can implement MongoDB in a column!

PostgreSQL is comparable to Oracle, DB2, MSSQL and MySQL, except free (as in freedom). Being BSD licensed and more importantly not controlled by any single entity it has an active, thriving and coherent community, with several companies providing commercial support.

Out of the box PostgreSQL comes with a raft of features, putting it on par with many commercial offerings and by far ahead of competing projects such as MySQL. If built-in features were not enough, many extensions exist, a major example being PostGIS - a complete geographic information system (comparable to ESRI SDE).

Castles are always better built on rock than sand, and given that the GroupWise Mobility Service uses PostgreSQL, for example, I figured it would be best to discuss how to build a rock solid PostgreSQL platform, rather than focusing on interesting application features.

We’ll cover installing and configuring PostgreSQL, a brief guide to performance tuning and backing up your data.

Installing PostgreSQL

SUSE Linux Enterprise Server 12 comes with PostgreSQL 9.3 already packaged and available in the SLE 12 repositories. You can install PostgreSQL as follows:

sle12:~ # zypper in postgresql93 postgresql-93-server postgresql93-contrib

Once we have installed the PostgreSQL packages, start PostgreSQL, this will initialise the database cluster and generate the default configuration files:

sle12:~ # systemctl start postgresql
sle12:~ # systemctl enable postgresql

You’ll now be able to connect to the local PostgreSQL server from the postgres system account. The psql program is the PostgreSQL client:

sle12:~ # su postgres
postgres@sle12:~> psql
psql (9.3.9)
Type “help” for help.

postgres=#

Now to set a password for the postgres super user, use the \password command, as follows:

postgres=# \password
Enter new password:
Enter it again:

You can quit the psql client using the \q command:

postgres=# \q

Configuring PostgreSQL

To end up with a well configured PostgreSQL server, there are a few key configuration settings that we should change. The PostgreSQL configuration file can be found at /var/lib/pgsql/data/postgresql.conf.

Firstly we probably want to configure network access to PostgreSQL by changing the following:

listen_addresses = ‘*’
port = 5432
max_connections = 100

We’ll also want to increase the amount of information which is stored in the WAL logs. This is to allow us to use replication and backup tools. It’s recommended to set this up from the start to avoid having to reconfigure your server in the future.

wal_level = hot_standby
full_page_writes = on
max_wal_senders = 10
wal_keep_segments = 1000
hot_standby = on
hot_standby_feedback = on

It is also suggested to configure logging. I tend to favour rather verbose logging, primarily as the information is invaluable when trying to track down problems: people cannot argue against data. However on a busy server this will use a fair amount of disk space.

log_destination = ‘stderr’
logging_collector = on
log_directory = ‘pg_log’
log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 1GB
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = 0
log_checkpoints = on
log_line_prefix = ‘%t [%p]: [%l-1] user=%u,db=%d ‘
log_temp_files = 0
log_lock_waits = on
log_duration = on
log_connections = on
log_disconnections = on
log_timezone = UTC
log_statement = ‘all’

Once you’ve made these changes to `/var/lib/pgsql/data/postgresql.conf` make sure you restart PostgreSQL using:

sle12:~ # systemctl restart postgresql

PostgreSQL Access Controls

A separate configuration file is used to configure host based access. The /var/lib/pgsql/data/pg_hba.conf file configures which hosts are permitted to connect to PostgreSQL and how the users must authenticate.

If we want to permit remote hosts to login, using MD5 encrypted password auth, we would add the following entry:

host all all 10.1.1.0/24 md5

This will require all clients from the `10.1.1.0/24` network to provide a password which is hashed using MD5. If `md5` was changed to `trust` no password would need to be given.

When you change the `pg_hba.conf` file, you will need to get PostgreSQL to reload this configuration, this can be done without a restart using:

sle12:~ # systemctl reload postgresql

Tuning PostgreSQL

Performance tuning database servers is a massive topic in it’s own right, so stand back for the 5 minute PostgreSQL tuning guide. Tuning PostgreSQL is dependent upon both the server hardware and also the application, so this is very much a rough guide.

Keep in mind that tuning will only give incremental performance gains. A database is always bound by the performance of the hardware it is being run on. So be realistic about the hardware. If you’re not running on SSDs, buy some! Also be smart about what you buy. Read reviews, look at stats and certainly do not trust your server vendor!

Kernel Tuning

For database servers there are a few key kernel settings we need to tune. We need to ensure that swap is only used as a last resort. I’d argue that you’re better off running without swap these days. If you have more that 4GB RAM in your server, disable swap.

We need to ensure that the server is operating in strict no-overcommit mode. This is often overlooked, but extremely critical if you care about your data integrity and stability. By default Linux will honour all malloc (memory allocation) requests, regardless of whether there is enough memory free. Only when memory is accessed is it backed, and should the system have completely run out of RAM the Out Of Memory (OOM) Killer is invoked, which will terminate processes at random.

To smooth out database performance we want to configure Linux to aggressively write dirty pages to disk. This reduces sudden IO spikes during database checkpoints. A checkpoint is a point in time where all modified data pages are guaranteed to be on disk. Checkpoints should be happening about once every 5 minutes, slightly quicker under load.

To make all these above changes, we need to edit sysctl.conf. Add the following at the end of the /etc/sysctl.conf file:

vm.swappiness=0
vm.overcommit_memory=2
vm.overcommit_ratio=95
vm.dirty_ratio=2
vm.dirty_background_ratio=1

Filesystem

At the end of the day PostgreSQL just stores data in files on disk. As such the filesystem you use can make a big difference, especially on high performance disk arrays.

You should be using XFS to hold your PostgreSQL database. XFS has excellent performance characteristics and is a very mature and robust filesystem. Under benchmarks PostgreSQL running on XFS outperforms other filesystems. XFS can also be optimised for your underlying disk array and for parallel performance.

PostgreSQL Tuning

There are a number of settings which should be changed from the default to improve server performance. The default PostgreSQL settings are highly conservative for modern hardware.

- Tuning Memory Usage
One key setting to configure is shared_buffers. This is the amount of shared memory that PostgreSQL will use to cache database buffers. As a rule of thumb this should be set to ¼ of the RAM in your server. However it should not exceed 10GiB. PostgreSQL relies heavily upon the kernel page cache to cache database files and having shared_buffers set too high wastes more memory due to double buffering and actually reduces performance.

It’s also important to tune the amount of memory that PostgreSQL will use for activities such as sorting. The key setting here is work_mem which configures the maximum amount of memory that PostgreSQL will use for operations such as sorting. The complexity here is that work_mem is allocated for each session and possibly more than once per session.

Setting this too high will cause your server to fail should too many clients be active concurrently . Set work_mem to be ¼ of your server’s RAM divided by the number of max connections you allow. If the result of this sum is very low (less than 4MiB) then your number of max connections is too high or you need more RAM.

PostgreSQL has some important background processes, e.g. autovacuum. Due to PostgreSQL’s concurrency scheme, UPDATEs and DELETEs result in duplicated rows building up within a table. A process called VACUUM is responsible for cleaning up these dead rows and this is automatically performed by autovacuum. If your database contains a number of large tables (over 32GiB) increasing the maintenance_work_mem setting can help speed up these operations. By default there will be at most 3 processes consuming maintenance_work_mem, so we can be a lot more generous than with work_mem. Setting maintenance_work_mem to 1GiB or more is not unusual.

The effective_cache_size parameter should be configured to the amount of memory in the system which can be expected to be used by the page cache. On a dedicated server, this is: total_ram - (shared_buffers + (work_mem * max_connections) + (maintenance_work_mem * 3)).

For a dedicated PostgreSQL server with 128GiB RAM, use the following settings:

shared_buffers = 10GB
work_mem = 128MB
maintenance_work_mem = 4GB
effective_cache_size = 96GB

- Tuning Checkpoints
Checkpoints are points in time when modified database pages are guaranteed to be stored on disk. PostgreSQL uses a Write Ahead Log for data integrity, and all changes that are going to be made are first logged and flushed to disk before being actually made. This log can be replayed in the event of a server crash. As such, checkpoints are essential for fast recovery. If there is too long between checkpoints the database has to replay large amounts of log, which slows down getting your server back up and running.

By default checkpoints will happen every 5 minutes in the worst case. However the default settings are poor for servers with good disk IO which do large data loads. Since by default a checkpoint will happen every 3 log segments (48MiB). This will cause a large number of flushes to disk and ultimately will reduce the throughput of the server. In the event that checkpoints are happening too frequently, the following error message will be visible in the log files:

checkpoints are occurring too frequently (10 seconds apart)

For a modern server with good IO performance, it is recommended to use the following settings:

checkpoint_segments = 256
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9
checkpoint_warning = 30s

- Tuning The Planner
All database queries are planned prior to execution. There is more than one way to execute a database query. The Planner is responsible for selecting the most efficient and performant method to execute a query. The Planner relies upon statistics about tables which are collected as part of the autovacuum subsystem.

The Planner computes the estimated cost of various ways to execute a query, for example a sequential scan vs an index lookup. The costings of these fundamental operations can be altered in the PostgreSQL configuration.

When using SSDs, unlike spinning disks, the cost of random vs sequential IO is essentially the same. The latency of seeking for data on an SSD is near zero and significantly lower than a spinning hard drive, primarily as a spinning disk needs to wait for data to spin around under the read head. Therefore it is worth configuring the cost of sequential and random IO to be the same:

seq_page_cost = 1.0
random_page_cost = 1.0

Backing Up PostgreSQL

There are three key techniques to backing up a PostgreSQL database. The best method will largely depend upon how large your database is and your recovery window.

- Database Dumps
The simplest method to backup your database is to take a full dump of the database. This will serialise out the complete database structure and data to SQL (or a custom format).

Dumping a database has the advantage of it being restorable on different hardware and different PostgreSQL versions. In fact to upgrade PostgreSQL without using tools such as pg_upgrade a dump and restore needs to be performed.

Sadly for large databases a full dump will take an overly long amount of time to both dump and restore making it non-ideal.

You can take a complete SQL dump of the PostgreSQL server using the pg_dumpall command. This will dump the entire database cluster, including users and tablespaces:

sle12:/mnt/backup/pgsql # su postgres
postgres@sle12:/mnt/backup/pgsql> pg_dumpall | gzip > “pgsql_$(date +%y-%M-%d-%H-%m).sql.gz”

You can dump only a single database using the pg_dump command. For example to dump a database called test:

postgres@sle12:/mnt/backup/pgsql> pg_dump -d test | gzip > “pgsql_test_db_$(date +%y-%M-%d-%H-%m).sql.gz”

- Base Backups
Base backups refer to backing up the data files that PostgreSQL stores on disk. With a large database taking a base backup can be an effective method since recovery time can be relatively short.

Since PostgreSQL 9.1 the simplest way to make a base backup is by using the pg_basebackup utility. This operates over the PostgreSQL replication protocol and only requires access to the normal PostgreSQL port. As such it can be executed from a remote machine.

Before we can use pg_basebackup we will need to create a replication user. Create the user with:

postgres@sle12:~> createuser --replication -E -P rep
Enter password for new role:
Enter it again:

Next we need to allow the replication to authenticate with PostgreSQL. We will use trust auth, since we want to be able to easily automate backups. Edit the /var/lib/pgsql/data/pg_hba.conf file and add the following line:

local replication rep trust

We can now use pg_basebackup to produce a compressed file containing our entire on disk database cluster::

postgres@sle12:~> pg_basebackup -s 5 -U rep --xlog-method=fetch --checkpoint=fast -Ft -z -D - > “pgsql_base_$(date +%y-%M-%d-%H-%m).tar.gz”

- Point In Time Recovery
Should you need to be able to recover your PostgreSQL server to a specific moment in time, use Point In Time Recovery. For example your developer accidentally dropped your production customer table at 16:43 on a Friday evening.

Point in time recovery uses the WAL logs to be able to recover a database to a specific moment. Point in time recovery requires that you make a base backup and also store every WAL segment generated from after you took the base backup.

The tradeoff to balance with point in time recovery is how long it can take to restore to a given moment in time. If you took the base backup once a month it would take a long time to recover to a few weeks in the future. However the more frequently you take base backups, more space is required or a smaller backwards recovery window.

In Conclusion

Hopefully this has been a short introduction to building a great PostgreSQL platform, ready to store all your organisation’s most important data. There are many features of PostgreSQL from both a development and operations angle which are extremely useful. Replication was one topic that we didn’t have space for in this article, as it forms a large part of building an available and solid PostgreSQL platform we’ll aim to cover it in another article.

This article was first published in OH Magazine Issue 31, 4/2015, p23-27.