Zabbix cleaning history_uint table efficently

For some reason my Zabbix history_uint table has grown to 25GB and 293 million rows.  The zabbix housekeeping process isn't keeping up, which is understandable since it deletes rows one at a time.  The history_uint table has only one index on the combination of (itemid,clock) so if you try to delete based on those separately (like delete all rows older than a certain timestamp), you end up with a sequential scan of all 293 million rows.  To get around this we delete things in batch, using a temp table to hold each batch.

Step 1: Create a temp table based on history_unit, limit the number of records.  Limit doesn't guarantee any sort of order, but in this case of history_uint, a table that only ever gets inserted into as long as it lives, it seems to give me earlier records first.  We will check the records later.

select itemid,clock into TEMP Table items_to_be_deleted from history_uint limit 50000 offset 0;

Step 2: Next we try to delete all records from the temp table that are younger than the date we are cleaning up to.  This uses a sequential scan, but since there are only a limited number of records in the temp table, it doesn't take very long.  Once you start to hit newer records in your temp table, you can play with the offset to get a different set, but once you get things mostly cleaned up the housekeeping process should be able to keep up again.

DELETE FROM items_to_be_deleted where clock>1393668000;

Step 3: Now we use the records left over in the temp table to feed the delete statement with the itemid,clock values it wants to be able to use the index.  This can take a while, but it is way faster than trying to do the whole delete at once.

delete from history_uint where (itemid,clock) in (select itemid,clock from items_to_be_deleted );

Step 4: Drop the temp table so you can start the process over again.

DROP TABLE items_to_be_deleted;

Now you can go through the process again, and again to slowly clean out the table in smaller chunks.  You can increase the limit value to take bigger chunks at a time, as long as the delete step takes a reasonable amount of time for your environment.

Once you are all done, do a Vacuum Analyze to make sure postgres is aware that the table has been shrunk down.

Tags: