I have a table with no primary key.
The are 6 columns. The first four columns represent one object. There is then a status and a datetime (Updated).
A new record is added into the table whenever a status changes (I need to keep the old status as well).
The problem I have is how to efficiently purge old data.
I need to keep all records with a datetime within the last week.
I also need to ensure that I at least one status is kept for each combination of the first four columns.
The table contains over 150,000 records. About 1,000 statuses change per day.
I want to purge the data once a week - delete about 7,000 records.
This is the query I wrote
delete from table1 where Updated <
(select max(Updated) from table1 t1 where table1.col1 = t1.col1 and table1.col2 = t1.col2
and table1.col3 = t1.col3 and table1.col4 = t1.col4 group by col1, col2, col3, col4)
and Updated < GetDate() - 7
This seems to be fairly fast - records only change a couple of times a week, so the subquery only returns 3 or 4 records. The subquery will be called on almost all the records in the table though.
Can anyone see a more efficient way of doing this?
Well, I'd use a dereived table myself:delete from table1 t1
inner join
(select col1, col2, col3, col4, max(Updated) as MaxUpdated
from table1
group by col1, col2, col3, col4) t2 on t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.Updated = t2.MaxUpdated
where t1.Updated < GetDate() - 7
That usually seems to perform faster then a subquery, but you'd have to try it for yourself to know for sure.
No comments:
Post a Comment