Tuesday, July 29, 2008

SQL 2005 to delete lots of data in batches

If you want to delete lots of data (millions), Running one query causes the TransactionLog to grow with huge size.

You have 2 solutions for this problem:
1. If you can your data is offline:
  • Copy the rows you want to keep to a temporary table
  • Drop the original table
  • Rename the temporary table to original name
  • Reinstate any indexes

2. Second approach is to delete the rows in a loop. Delete a modest number each time round the loop. Keep looping until no more rows exist to delete.

  • You will need to either backup the TLog frequently during this process (to stop it extending to a vast size), or change the RECOVERY MODEL to SIMPLE whilst this is running, and back to FULL again after it finished.
  • If this batch works 24 hours/7 days you should also put a WAIT for 5 seconds or so inside the loop so that during each iteration other connected users get "their chance"

SQL Script to delete rows in loop:(needs local variables declaring)

SELECT @intRowsToDelete = COUNT(*) -- Number of rows to be deleted FROM dbo.MyTable WHERE ... MyDeleteCriteria ...
WHILE @intRowCount > 0 AND @intErrNo = 0 AND @intLoops > 0

BEGIN
SELECT @dtLoop = GetDate()
SELECT @strSQL =
SET ROWCOUNT @DEL_ROWCOUNT -- number of delete rows / iteration
DELETE D FROM dbo.MyTable AS D WHERE ... MyDeleteCriteria ...
SELECT @intErrNo = @@ERROR, @intRowCount = @@ROWCOUNT
SET ROWCOUNT 0 -- Reset batch size to "all"
SELECT @intRowsToDelete = @intRowsToDelete - @intRowCount,
@intLoops = @intLoops - 1
-- Debugging usage only:
PRINT 'Deleted: ' + CONVERT(varchar(20), @intRowCount)
+ ', Elapsed: ' + CONVERT(varchar(20), DATEDIFF(Second, @dtLoop, GetDate()))
+ ' seconds,' + ' remaining to delete=' + CONVERT(varchar(20), @intRowsToDelete)
+ ', Loops left=' + CONVERT(varchar(20), @intLoops)
WAITFOR DELAY '000:00:05' -- 5 seconds for other users to gain access
END

No comments: