| 
				 
				Retrofitting Table Level 
				Delete/Archive Strategies - Unbounded Table Growth by Design 
				
				by 
				Bill Wunder 
				In this discussion we will look 
				at some data maintenance techniques for logging type tables. 
				This will not be a discussion about the SQL Server Transaction 
				Log database device that records all transactions and the 
				database modifications made by each transaction and is used for 
				recoverability and replication by SQL Server. Instead, we'll be 
				examining a particular type of application table that records 
				events at the application level. Typically a logging table will 
				see a relatively high volume of inserts and is designed to 
				always have rows added to the end of the data set. Optimally, to 
				achieve this end the data set is ordered by the clustering key 
				of the table. The main benefit of this design is that new rows 
				do not suffer the overhead of the b-trees split and promote 
				processing at insert. Logged data is sometimes referenced later 
				by the application but ideally it is not changed once it is 
				inserted into the logging table. In practice it is far too 
				common for a logged data table to be used to track the state of 
				a main "thing" or entity described in the initial logged event. 
				Likewise it is far too common for an application design to 
				consider only rows being added to a logging tables and to omit 
				considerations for managing unbounded growth of the data set. In 
				this discussion we'll try to cover some ways to accommodate 
				these table and application design deficiencies without the need 
				to rewrite applications solely for the purpose of being able to 
				maintain the data. 
				Examples of events that might get 
				logged might be DNA samples entering a universities research 
				database, shipments sent from a clearing warehouse, story 
				requests at a e-magazine, pump bearing temperature measurements 
				for a water treatment plant, or even the access per user to an 
				interesting web page. I'll be talking in more detail about about 
				these examples later in the discussion. First let's consider 
				some general software development practices and begin to 
				understand why and how managing the size and growth of logging 
				tables can become problematic.  
				I keep not reading about 
				people that have problems trying to maintain logging table 
				growth using a datetime column index. At the same time I keep 
				seeing applications in my shop choking on that very activity. 
				Admittedly, part of the problem is that we wait until we have 
				tens of GBs of data in a table before someone decides to try to 
				schedule a job or execute a hastily thrown together one off 
				purge from Query Analyzer that is supposed to keep just the last 
				six months or two weeks or only those rows with a Boolean value 
				set to the incomplete state or what ever comfort zone of history 
				makes sense for a given logging table. Interestingly, the 
				reasons for keeping n days or weeks or months or years of 
				data or only a particular state  aren't always well founded. 
				Still, illusive business rules and questionable reasons not 
				withstanding, the impact on the system tends to be fairly 
				predictable. The delete/archive processing is slower than 
				expected and the contention inflicted on an application as it 
				attempts to slip additional records into the logging table 
				during the longer than expected delete/archive operation 
				processing is much worse than anticipated. In the worst cases, 
				particularly when logging volume is high and the number of rows 
				to be deleted is large, primary application queries will time 
				out waiting for the delete operation to complete or a hastily 
				concocted ad hoc purge will irreversibly remove the wrong rows, 
				or another surprising calamity will befall the application as a 
				direct result of trying to remove the presumed to be unneeded 
				rows.  
				For a date based delete, for 
				example, we might typically see a first iteration approach such 
				as: 
				
				    declare @Cutoff datetime 
				
				    set @Cutoff = getdate() - 
				180    
				
				    delete SampleLog where 
				CompletionDate < @Cutoff     
				And then, after the discovery 
				that the delete operation is still executing hours later and 
				holding a table lock along with many more granular locks and 
				thereby preventing all inserts the developer may kill the job - 
				or complain to a DBA that there is "something wrong with the SQL 
				Server" and the DBA kills the job. Then the developer goes back 
				to the cutting room and returns with something like: 
				
				    declare @Cutoff datetime 
				
				    set @Cutoff = getdate() - 
				180    
				
				    set rowcount 1000 
				 
				
				    delete SampleLog where 
				CompletionDate < @Cutoff 
				
				    while @@rowcount = 1000 
				
				        delete SampleLog where 
				CompletionDate < @Cutoff 
				
				    set rowcount 0           
				 
				Usually after the second attempt 
				fails when the system is brought to it's knees due to severe 
				disk queuing or the delete simply cannot remove rows as fast as 
				they are inserted or the log device fills a disk in Full 
				Recovery mode or another of the myriad of "gotcha"s gets us we 
				have to come up with yet another alternative. In most cases the 
				situation is somewhat critical in the moment and it may even be 
				the middle of the night so a truncate table issued on the log 
				table can bring some temporary relief if the available disk 
				space has been exhausted or a server restart can get a server 
				that has stopped responding to respond again or maybe even a 
				combination of a table truncation and a service restart is 
				needed. Sooner or later though it becomes necessary to come up 
				with a solution that is elegant and provides a means of 
				preserving the desired log data and gets the aged rows off the 
				server. 
				One class of tactic that proves 
				flexible and effective for me in retrofitting some discipline 
				into applications with unbounded logging table growth built in 
				to the design are some variations of locally partitioning the 
				logging table. There doesn't seem to be a single one size fits 
				all partitioning strategy because logging table data usage is so 
				different from application to application and even logging table 
				to logging table. The range of partitioning possibilities is 
				from the classic fully partitioned view down to a legitimate 
				regularly scheduled table truncation. A good understanding of 
				the possibilities and a complete consideration of a logging 
				table's usage are key to unlocking the right solution.  
				 
				My first preference for planning 
				delete/archive strategies is to do so when the application or 
				application subsystem is still in design. Of course I frequently 
				hear the farfetched - even if somewhat founded in reality - 
				argument that the system with gigabytes of log data is 
				still in design. Always reluctant to stay out of other peoples 
				fantasies I won't delve too deeply into that position. Suffice 
				it to say that when I say planning delete/archive in design 
				phase I mean empirically deciding how much log data to keep, 
				exploiting the features of the architecture to support the 
				delete/archive strategy, and actually testing the delete/archive 
				processing in the integrated system testing environment with a 
				specific test plan element to measure design compatibility and 
				contention issues between the delete/archive process and the 
				application. Designing for unbounded growth of data that does 
				not need to be kept indefinitely is akin to planning a trip to 
				Mars and holding of on how to achieve the return trip until you 
				get to the red planet. The truth is I have yet to work in the 
				shop that considers delete/archive a legitimate aspect of 
				application development. (That is to say I know many developers 
				that are now permanent Martian residents.) I have however, over 
				and over again, lived through the production crises and failures 
				that come from such denial in the software lifecycle. It's an 
				interesting even if frustrating problem. My hope in sharing some 
				solutions here is to alleviate some of the frustration for you 
				if you find yourself having to fight such problems of inadequate 
				planning and design; and also to provide some apologetics for 
				proper design fundamentals. The latter owing to the fact that 
				shoehorning a delete/archive strategy onto an application as we 
				must do in these cases is somewhat complicated, potentially 
				fragile, and always a more difficult path than thinking about 
				application needs and realities during the early design phases 
				of a project. Or, to complete my metaphor, we are much more 
				likely to have the resources to get home from Mars if we plan 
				the whole trip before the launch date. 
				Since I'm describing only a 
				retrofit to an existing application in this discussion, I want 
				to be clear that one of the objectives of such an initiative 
				needs to be to limit the changes needed in the application logic 
				as much as possible. I also want to be clear that I am making no 
				certification as to the legitimacy of logging table data in this 
				discussion. At the point of disruption to production it is 
				incumbent upon the DBA to first do what is necessary to 
				stabilize the production environment and then - if necessary and 
				appropriate - to invite a consideration toward redesign of the 
				application or subsystem. If we were designing holistically and
				a priori the possibilities are not so limited, but in 
				this discussion where the application context is a pre-existing 
				and an in situ given (i.e. already in production) it is 
				of the utmost importance that we do not introduce monumental 
				application changes in order to provide an effective 
				delete/archive process. Almost seems a little silly at one level 
				- don't you agree - to have to take this narrow approach but in 
				the context of regression testing and iterative development 
				there is clearly an imperative to not create a new application 
				solely so the application can effectively rid itself of unneeded 
				data.  
				With that said, I want to next 
				state that it may not always be possible to completely avoid a 
				rewrite when delete/archive has been ignored at the fore of the 
				development process. It is even more likely that a retrofit 
				delete/archive can and will degrade other application behaviors. 
				Adding delete/archive capabilities by definition involves 
				touching the fundamental underpinnings of a data model and 
				therefore compromises are unavoidable. Knowing when to step back 
				and alter an application design and when to forge ahead with a 
				delete/archive retrofit can be treacherous and deserves the same 
				care and consideration as any other software development 
				iteration. There should always be checkpoints and collaboration 
				with any software revision - just as this is crucial in the 
				initial design -  to keep from ending up with an unworkable 
				solution.  
				Building from this groundwork, in 
				my next installment I'll move into a discussion that attempts to 
				classify logged tables and then I'll propose some variations of 
				data partitioning and suggest some tools and tricks that might 
				be used to effectively support a delete/archive approach for 
				each of those general classifications. And since we're all all 
				stuck at least 35 million allegorical miles from home until we 
				can get this worked out I'm hoping you find the time to check it 
				out. 
				Bill     
				    |