I can’t understand how to delete files in pg_wal (PostgreSQL)

Recently I started learning PostgreSQL.

I’ve created an empty database, “mydb“. At this point I have 4 databases – 3 default ones (template0, template1, postgres) and my database “mydb“.

The size of the folder pgdata is about 60 MB. After that I’ve downloaded a database for learning purposes from the internet with the size of 300 MB in dump format (“.sql”) and installed it to the “demo” database. Size of pgdata was 360 MB. After a while I decided to delete the “demo” database and typed DROP DATABASE demo.

The size of pgdata became 250 MB instead of supposed 60 MB. I didn’t understand why it happened, so I decided to delete “mydb”. I found out that all the trash was in pg_wal folder. Due to the internet – there are necessary files, that I mustn’t delete or my computer will explode. So, the problem is, why, when I deleted the database there are still trash files on my system.

I tried commands VACUUM, DELETE, INDEX and many other from the internet. The archivation is disabled and I don’t have replication slots at all. I don’t know what to do, so can you, please, clarify this for me? How can I delete all the trash files from my computer? I will appreciate your help, thanks!

I’ve tried so hard, and got so far. But in the end, it doesn’t seems to work 🙂

  • Why are you doing this? You are trying to learn something, I guess, but what? Learning how to stress out over trivialities is not a very valuable skill to have.

    – 

When WAL files are no longer needed, they can either be deleted or recycled “ahead” for future use. How many are recycled ahead depends on max_wal_size, and on how fast WAL was recently being written. During a bulk load, WAL is written fast, so many may get recycled ahead. But then once things quiet down it can take a very long time to work through those recycled-ahead files. Forever, if the database is idle thereafter.

Lowering max_wal_size would have reduced the number recycled in the first place. But there is no automatic way to remove those files once recycled, other than waiting for them to get consumed. You could check where the current WAL write pointer is (pg_controldata, amongst other ways) and manually delete files ahead of that point. But unless the database is shutdown while you do this, there is no way to guarantee a file didn’t become active between when you looked and when you did the delete, so this is a fraught activity.

@Oleg.
It sounds like you’re experiencing some confusion with PostgreSQL’s disk space usage and how dropping databases works.
Let’s break down what might be happening here:

Database Deletion
When you run DROP DATABASE demo, PostgreSQL removes the database and its associated files, but it might not immediately release all the disk space it occupied. This is because PostgreSQL uses a write-ahead log (WAL) to maintain data consistency and for crash recovery. The files in the pg_wal directory are part of this log. They are crucial for maintaining data integrity and recovery even after a database is dropped. These files will be cleaned up eventually, but PostgreSQL might take some time to do so.

Vacuuming

Running the VACUUM command doesn’t directly affect the size of the pg_wal directory. Vacuuming is primarily concerned with reclaiming space within the database’s data files. It’s not designed to clean up the write-ahead log (WAL).

File Retention

PostgreSQL keeps old WAL segments around for a certain period, especially if replication or other data protection mechanisms are in use. These segments allow you to recover from a certain point in time. You can configure the retention period using the retention parameter in the PostgreSQL configuration file (postgresql.conf).

Manual Cleanup

If you’re certain that you no longer need the dropped databases and want to immediately free up disk space, you can try the following steps:

  1. Connect to the PostgreSQL server using a superuser role.
  2. Run the pg_resetxlog command on your database cluster. This command is used to reset the write-ahead log and other control information. Note: This is a risky operation and should be done with caution. It’s recommended to back up your data before attempting this.
  3. After running pg_resetxlog, you may need to perform a full cluster backup and restore, as the cluster might be in an inconsistent state.

Time and Patience
If you don’t need to immediately free up disk space, you can simply wait for PostgreSQL’s maintenance processes. This will eventually clean up the old WAL segments, and the disk space will be reclaimed over time.
Remember that dealing with PostgreSQL internals can be complex and potentially risky and only if you know exactly what you are doing, you should do it.

If you’re not completely comfortable with these processes, it’s a good idea to seek guidance from experienced database administrators or PostgreSQL experts (there are a lot of professionals that you can seek online on “one time job” websites).

Ps. Always make sure you have backups before making any changes to your DBMS configurations / files.

The transaction log in pg_wal records redo information for all changes you perform to the database cluster. That information is for all databases. If you drop a database, that is a change, so it will add more information to pg_wal.

Eventually (by default, at least once every five minutes), PostgreSQL will perform a checkpoint, and when that is done, it will automatically delete unneeded files in pg_wal. So wait a bit, and after a while pg_wal should shrink to at most max_wal_size (by default 1GB).

Leave a Comment