| 
				 
				Retrofitting Table Level 
				Delete/Archive Strategies - Updatable 
				Partitioned Storage Models 
				
				by 
				Bill Wunder 
				  
				
				Hopefully something useful for you 
				has emerged from this series of articles on the topic of 
				rescuing a successful (i.e. busy and growing) application from a 
				data model that features unbounded growth in crucial tables.
				 
				  
				
				Several weeks ago the first article 
				laid a groundwork for identifying 
				Unbounded Table Growth by Design before we moved into a 
				descriptive essay concerning the basic
				Classifications of 
				Logging Tables. In that classifying article we considered 
				that the etiology of unbounded table growth implementations 
				ranged from application requirements that really had only a 
				brief interest in the data - even though the data was saved 
				ad infinitum - to application interest that actually never 
				ended and in fact really did need to support DML operations 
				against even the oldest of rows. Only at this latter extreme - 
				if at all - are we truly able to fully justify unbounded table 
				growth. As you may recall I suggested that the frequency of real 
				world circumstances tended to approximate a bell curve between 
				those two extremes with most circumstances falling some where in 
				between. Then we continued into an exploration of local 
				partitioning strategy templates useful to control growth of 
				heretofore endlessly growing tables in ways that would not be 
				cataclysmic for the application with minimal but varying 
				necessity to modify the application to accommodate the new data 
				design, and as well, that using a consistent conversion and 
				maintenance strategy would prove much easier to support than an 
				onslaught of one-off triage efforts (in common developer 
				language that's the same thing as logging on to Query Analyzer 
				late at night and deleting as many rows as possibly before 
				something or someone starts complaining). We considered some
				Short 
				Term Storage Models or Saturday Night Live designs 
				that basically just alternated two tables in a view and allowed 
				us to quite painlessly persist an appropriate time frame of 
				data. Then we recognized that - towards the apogee of that bell 
				curve - a deeper analysis is necessary to continue to use 
				variations of that alternating view DDL design when ever 
				possible as
				
				Moderately Complex Storage Models or Middle of the Road 
				designs. However, at some point beyond the curve's apogee the 
				one table per view approach is no longer workable and it is 
				necessary to contemplate a formal partitioning strategy and 
				combine the tables under a single view that can be efficiently 
				updated. In keeping with my tacky naming convention, such 
				problems fall into our current design category: Life of the 
				Party or Updatable Partitioned Storage Models.   
				  
				
				An updateable partitioned view 
				requires a check constraint to differentiate the data in each 
				table and the column in the check constraint must be made a part 
				of the primary key for each table. Once you get into the 
				territory that necessitates a local partitioned view things can 
				get complex in a hurry. 
				  
				
				For starters, in order for the union 
				view to be updatable, you cannot create the partitioning 
				constraint WITH NOCHECK. If you just need to get better 
				performance out of the union query that is more appropriately 
				considered a Middle of the Road implementation (e.g. you 
				will never insert, update, or delete through that view but you 
				need to execute range queries across both tables in the view 
				frequently) you can get away with check constraints created 
				WITH NOCHECK and you will possibly improve query 
				performance. It's important to test to make sure your query 
				plans will benefit form the check constraint, otherwise adding 
				the check constraint could be nothing more than an invitation 
				for your application to fail at rollover. So, assuming you need 
				to support modification of data across both tables in a unioned 
				view, plan for enough time for the check constraint to validate 
				the data at each rollover and do all you can to make sure the 
				tables are free from contention during that validation. If you 
				keep throwing application queries at the data during the 
				rollover processing the lock management overhead can have a 
				serious affect on the time needed to complete the validation. 
				Take the time to set up a test environment to examine the 
				behaviors of the rollover processing under a well simulated 
				application load.      
				  
				
				The next challenge will likely be 
				the need to accommodate a significantly larger temporal data 
				domain. With the less complex models we've examined the temporal 
				domain is probably going to be limited to days or weeks of data 
				in each table. It's more likely that an application that needs 
				updatability across the entire data set will also maintain an 
				interest in that data set for a longer period of time. This will 
				compound the time requirements for rollover as well as index 
				maintenance, result in a larger footprint on disk for the data, 
				and increase the scan costs for range queries where all search 
				arguments (SARGS) are not well supported by the underlying table 
				indices. Careful analysis and  testing will be critical to the 
				ultimate success of the local partitioning approach and the 
				benefits will vary greatly depending upon the application. 
				Simply put, this design model can be a tremendous benefit but 
				don't rush into it!  
				  
				
				I can't emphasize enough the 
				importance of analysis and testing. Chances are pretty good that 
				an application languishing in an unbounded table growth problem 
				got there because there wasn't adequate knowledge and brainpower 
				invested - not that it didn't exist, it just didn't get spent 
				here - in the original design. It would be be a doubly wasteful 
				mistake to repeat that mistake in your effort to repair the 
				problem. It's impractical to attempt to anticipate all the 
				issues you might encounter in this article. Instead my aim is 
				simply to remind you that now - as opposed to later - is the 
				time to identify and address those issues for your application. 
				  
				
				The issue of determining whether to 
				move rows into the table that is about to become current or to 
				take the more expensive route of adding a check constraint and 
				creating a full fledged partition view is the most important 
				decision point in any local partition retrofit. If there is an 
				efficient way to avoid the check constraint it must be fully 
				considered for the simple reason that creating that check 
				constraint carries risk. Please review the
				
				Middle of the Road article for a more complete 
				description of the risks as well as why and how to avoid 
				them.     
				  
				
				In keeping with the practices 
				established in the Saturday Night Live and the Middle 
				of the Road models we will stay with the common methodology 
				for this partitioning retrofit . Whether working on the simplest 
				model or the most stringent, we want a conversion script 
				that will morph the single table model to the desired 
				partitioned model, a rollback script capable of returning 
				the converted data model back into a single table model, and a
				rollover script that can flip flop the partitions at our 
				will as the table grows and the data ages. The conversion and 
				rollback scripts can be stored procedures or simple run once 
				scripts. The rollover script I strongly suggest be made a well 
				documented stored procedure so that the involved dependencies 
				for the partitioned table design remains visible to all 
				developers. In these examples I follow the convention of 
				creating a stored procedure for each operation if for no other 
				reason than to make it easier for you to deploy the examples 
				into a test environment as you follow the discussions. In 
				addition, each script will create the common database used for 
				all examples if it is not found on the SQL Server so that each 
				test case can be run independently. 
				  
				
				Select this link now to open the
				
				Local Partition Demo - Life of the Party script in a separate browser window so you 
				can toggle between the script and this article for the rest of 
				this discussion. 
				  
				
				For all scripts we will make use of 
				the INFORMATION_SCHEMA views. The INFORMATION_SCHEMA views are 
				your friends and Microsoft has made a commitment toward 
				supporting these views in future releases. If you have addition 
				questions concerning these views be sure to consult Books Online 
				documentation. 
				  
				
				For this Life of the Party 
				example we will use the same table structure as was used in the
				Saturday Night Live and Middle of the Road 
				examples. After creating the database if necessary the script 
				creates a new copy of that table with a different name in the 
				LocalPartitionDemo database where the simpler models were 
				implemented. Again, we want to have adequate columns to 
				demonstrate the necessary techniques but probably not as many 
				columns as you may have to deal with in a real word retrofit of 
				an unbounded table growth design. I think it will be 
				immeasurably useful if you can find a test or development SQL 
				Server 2000 to run the scripts as we move along, though I will 
				also present test cycle result sets so that you can see what is 
				happening if for some reason you cannot run the demonstration. 
				  
				
				The conversion script here is 
				very much the same as the other examples. The addition here is 
				the establishment of the data value to use in the check 
				constraint and the creation of the check constraint during the 
				atomic code section.  The conversion script makes the existing 
				single table become the "A" version of the view shrouded table 
				and then creates a clone of that table for the "B" version. Once 
				both tables exist, the script creates the partitioned view as 
				well as the other supporting view that might be desirable. Note 
				the use of the exec() function to create the views since 
				the create statement must be in it's own batch. (You could also 
				use sp_executesql.)   
				  
				
				Like the conversion script, 
				the rollback script is very similar to the one used in 
				the simpler models. Though here, I'm using something of a brute 
				force method to move all the data into a single table and 
				restore that table to the name of the pre-existing table. 
				Certainly much could be done to optimize this script in the 
				context of a specific situation. Hopefully by using this 
				approach here rather than the INFORMATION_SCHEMA based tactics 
				shown in the simper examples I have conveyed the notion that 
				even with the recommended retrofit methodology, there remains a 
				high degree of flexibility to make the processing most 
				appropriate for a given retrofit effort and also that the 
				rollback script has the lowest demand for elegance as long as it 
				works when you need it. 
				The rollover script is a 
				stored procedure that will determine the name of the current 
				table and the other table, then atomically truncate the other 
				table, create the check constraints and alter the view or views 
				appropriately. The rollover script is where the Life 
				of the Party model deviates most from the other models. 
				Notice that we make an effort to use the sp_executesql 
				command rather than exec() since this procedure will used 
				regularly and will likely be done by an automated task- while I 
				chose not to type the extra few lines in the conversion 
				script and the rollback script because those scripts 
				will likely only be needed one time and I'll be there to see the 
				Query Analyzer output - really just a little laziness on my part 
				for sure. It's possible that table definitions will be larger 
				than the allowable nvarchar(4000) for sp_executesql in 
				which case exec() becomes the more attractive option. 
				Like the other rollover script examples, this script 
				executes the the dynamic command string in a transaction so that 
				the "holdlock,tablock" has the effect of stopping other writes 
				but allowing reads while the alter view happens. In 
				some cases you prefer to use "holdlock,tablockx" to prevent all 
				other access as an aid to creation of the check constraints. 
				Unlike either of the other examples, here the INFORMATION_SCHEMA 
				query to identify the "current" table includes a join to the 
				constraint_column_usage and check_constraint views resulting in 
				a rather lengthy join clause. Also unlike the other examples, 
				because this script has a number of distinct operations that can 
				be accurately defined only after some information is collected 
				that in some cases can require a few additional seconds, this 
				rollover script employs a technique of building all the 
				statements to be executed and storing them in a table variable 
				before the transaction is started that will enforce the "holdlock,tablock" 
				locking hint for the duration of the rollover processing. And 
				one final significant difference in this script is to define and 
				enforce a business rule that establishes the minimum amount of 
				data we must keep on line at all times (@CycleIntervalPeriod ) 
				That provides all the components 
				we need to create and manage the updateable partitioning method. 
				You may wish to use the SQL Agent to initiate the nightly 
				rollover or you may want the application to be involved. This is 
				going to be driven by how much impact the rollover blocking has 
				on the application. Only good testing in your environment can 
				provide the ultimate answer to the question. The size of the 
				table being aged to the background, the quality of the index 
				used to identify rows that need to rolled over to the table 
				becoming current, the number of rows that will be rollover, and 
				the amount of contention between the check constraint validation 
				and the application will all have an impact on the duration of 
				this process.  
				The script concludes with a test 
				script to verify the conversion, rollover and rollback 
				processing. Note in these test cycles we'll also verify the 
				desired behavior that the rollover is not allowed if the result 
				would leave us with LESS than the desired duration of available 
				data. 
				If you follow the
				
				Life of the Party script through the output you can 
				see the structure and names of the views and tables as well as 
				the behavior of the data in this partitioned design. Once you 
				get a real conversion to the point of successfully executing a 
				similar test plan in the unit testing environment, you'll be 
				ready to apply the conversion and a few rollovers in the 
				integrated testing/development environment to verify that the 
				conversion and rollover processing plays well with your 
				application(s).  
				For those tables where you are 
				removing a large chunk of rows on a regular basis and 
				experiencing unbearable contention between the application and 
				the row removal local partitioning strategies can be very useful 
				to tame the rough edges of an otherwise mainstream application. 
				It is also effective in many cases to use the partitioning 
				strategies we have considered in this series of articles for a 
				hierarchy of tables. As you might expect, the analysis and 
				testing is all the more intricate and necessary but the results 
				are equally as spectacular.  
				I welcome your questions or 
				suggestions on how you might approach a partitioning strategy 
				with a hierarchal data collection or on any thing else we've 
				covered in these articles on Retrofitting Table Delete 
				Archive Strategies.  
				Bill  |