Sunday, February 19, 2012

Efficient way of purging old data?

I have a table with no primary key.

The are 6 columns. The first four columns represent one object. There is then a status and a datetime (Updated).

A new record is added into the table whenever a status changes (I need to keep the old status as well).

The problem I have is how to efficiently purge old data.

I need to keep all records with a datetime within the last week.

I also need to ensure that I at least one status is kept for each combination of the first four columns.

The table contains over 150,000 records. About 1,000 statuses change per day.

I want to purge the data once a week - delete about 7,000 records.

This is the query I wrote

delete from table1 where Updated <
(select max(Updated) from table1 t1 where table1.col1 = t1.col1 and table1.col2 = t1.col2
and table1.col3 = t1.col3 and table1.col4 = t1.col4 group by col1, col2, col3, col4)
and Updated < GetDate() - 7

This seems to be fairly fast - records only change a couple of times a week, so the subquery only returns 3 or 4 records. The subquery will be called on almost all the records in the table though.

Can anyone see a more efficient way of doing this?

Well, I'd use a dereived table myself:

delete from table1 t1
inner join
(select col1, col2, col3, col4, max(Updated) as MaxUpdated
from table1
group by col1, col2, col3, col4) t2 on t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.Updated = t2.MaxUpdated
where t1.Updated < GetDate() - 7

That usually seems to perform faster then a subquery, but you'd have to try it for yourself to know for sure.

No comments:

Post a Comment