Retrofitting Table Level
Delete/Archive Strategies - Updatable
Partitioned Storage Models
by
Bill Wunder
Hopefully something useful for you
has emerged from this series of articles on the topic of
rescuing a successful (i.e. busy and growing) application from a
data model that features unbounded growth in crucial tables.
Several weeks ago the first article
laid a groundwork for identifying
Unbounded Table Growth by Design before we moved into a
descriptive essay concerning the basic
Classifications of
Logging Tables. In that classifying article we considered
that the etiology of unbounded table growth implementations
ranged from application requirements that really had only a
brief interest in the data - even though the data was saved
ad infinitum - to application interest that actually never
ended and in fact really did need to support DML operations
against even the oldest of rows. Only at this latter extreme -
if at all - are we truly able to fully justify unbounded table
growth. As you may recall I suggested that the frequency of real
world circumstances tended to approximate a bell curve between
those two extremes with most circumstances falling some where in
between. Then we continued into an exploration of local
partitioning strategy templates useful to control growth of
heretofore endlessly growing tables in ways that would not be
cataclysmic for the application with minimal but varying
necessity to modify the application to accommodate the new data
design, and as well, that using a consistent conversion and
maintenance strategy would prove much easier to support than an
onslaught of one-off triage efforts (in common developer
language that's the same thing as logging on to Query Analyzer
late at night and deleting as many rows as possibly before
something or someone starts complaining). We considered some
Short
Term Storage Models or Saturday Night Live designs
that basically just alternated two tables in a view and allowed
us to quite painlessly persist an appropriate time frame of
data. Then we recognized that - towards the apogee of that bell
curve - a deeper analysis is necessary to continue to use
variations of that alternating view DDL design when ever
possible as
Moderately Complex Storage Models or Middle of the Road
designs. However, at some point beyond the curve's apogee the
one table per view approach is no longer workable and it is
necessary to contemplate a formal partitioning strategy and
combine the tables under a single view that can be efficiently
updated. In keeping with my tacky naming convention, such
problems fall into our current design category: Life of the
Party or Updatable Partitioned Storage Models.
An updateable partitioned view
requires a check constraint to differentiate the data in each
table and the column in the check constraint must be made a part
of the primary key for each table. Once you get into the
territory that necessitates a local partitioned view things can
get complex in a hurry.
For starters, in order for the union
view to be updatable, you cannot create the partitioning
constraint WITH NOCHECK. If you just need to get better
performance out of the union query that is more appropriately
considered a Middle of the Road implementation (e.g. you
will never insert, update, or delete through that view but you
need to execute range queries across both tables in the view
frequently) you can get away with check constraints created
WITH NOCHECK and you will possibly improve query
performance. It's important to test to make sure your query
plans will benefit form the check constraint, otherwise adding
the check constraint could be nothing more than an invitation
for your application to fail at rollover. So, assuming you need
to support modification of data across both tables in a unioned
view, plan for enough time for the check constraint to validate
the data at each rollover and do all you can to make sure the
tables are free from contention during that validation. If you
keep throwing application queries at the data during the
rollover processing the lock management overhead can have a
serious affect on the time needed to complete the validation.
Take the time to set up a test environment to examine the
behaviors of the rollover processing under a well simulated
application load.
The next challenge will likely be
the need to accommodate a significantly larger temporal data
domain. With the less complex models we've examined the temporal
domain is probably going to be limited to days or weeks of data
in each table. It's more likely that an application that needs
updatability across the entire data set will also maintain an
interest in that data set for a longer period of time. This will
compound the time requirements for rollover as well as index
maintenance, result in a larger footprint on disk for the data,
and increase the scan costs for range queries where all search
arguments (SARGS) are not well supported by the underlying table
indices. Careful analysis and testing will be critical to the
ultimate success of the local partitioning approach and the
benefits will vary greatly depending upon the application.
Simply put, this design model can be a tremendous benefit but
don't rush into it!
I can't emphasize enough the
importance of analysis and testing. Chances are pretty good that
an application languishing in an unbounded table growth problem
got there because there wasn't adequate knowledge and brainpower
invested - not that it didn't exist, it just didn't get spent
here - in the original design. It would be be a doubly wasteful
mistake to repeat that mistake in your effort to repair the
problem. It's impractical to attempt to anticipate all the
issues you might encounter in this article. Instead my aim is
simply to remind you that now - as opposed to later - is the
time to identify and address those issues for your application.
The issue of determining whether to
move rows into the table that is about to become current or to
take the more expensive route of adding a check constraint and
creating a full fledged partition view is the most important
decision point in any local partition retrofit. If there is an
efficient way to avoid the check constraint it must be fully
considered for the simple reason that creating that check
constraint carries risk. Please review the
Middle of the Road article for a more complete
description of the risks as well as why and how to avoid
them.
In keeping with the practices
established in the Saturday Night Live and the Middle
of the Road models we will stay with the common methodology
for this partitioning retrofit . Whether working on the simplest
model or the most stringent, we want a conversion script
that will morph the single table model to the desired
partitioned model, a rollback script capable of returning
the converted data model back into a single table model, and a
rollover script that can flip flop the partitions at our
will as the table grows and the data ages. The conversion and
rollback scripts can be stored procedures or simple run once
scripts. The rollover script I strongly suggest be made a well
documented stored procedure so that the involved dependencies
for the partitioned table design remains visible to all
developers. In these examples I follow the convention of
creating a stored procedure for each operation if for no other
reason than to make it easier for you to deploy the examples
into a test environment as you follow the discussions. In
addition, each script will create the common database used for
all examples if it is not found on the SQL Server so that each
test case can be run independently.
Select this link now to open the
Local Partition Demo - Life of the Party script in a separate browser window so you
can toggle between the script and this article for the rest of
this discussion.
For all scripts we will make use of
the INFORMATION_SCHEMA views. The INFORMATION_SCHEMA views are
your friends and Microsoft has made a commitment toward
supporting these views in future releases. If you have addition
questions concerning these views be sure to consult Books Online
documentation.
For this Life of the Party
example we will use the same table structure as was used in the
Saturday Night Live and Middle of the Road
examples. After creating the database if necessary the script
creates a new copy of that table with a different name in the
LocalPartitionDemo database where the simpler models were
implemented. Again, we want to have adequate columns to
demonstrate the necessary techniques but probably not as many
columns as you may have to deal with in a real word retrofit of
an unbounded table growth design. I think it will be
immeasurably useful if you can find a test or development SQL
Server 2000 to run the scripts as we move along, though I will
also present test cycle result sets so that you can see what is
happening if for some reason you cannot run the demonstration.
The conversion script here is
very much the same as the other examples. The addition here is
the establishment of the data value to use in the check
constraint and the creation of the check constraint during the
atomic code section. The conversion script makes the existing
single table become the "A" version of the view shrouded table
and then creates a clone of that table for the "B" version. Once
both tables exist, the script creates the partitioned view as
well as the other supporting view that might be desirable. Note
the use of the exec() function to create the views since
the create statement must be in it's own batch. (You could also
use sp_executesql.)
Like the conversion script,
the rollback script is very similar to the one used in
the simpler models. Though here, I'm using something of a brute
force method to move all the data into a single table and
restore that table to the name of the pre-existing table.
Certainly much could be done to optimize this script in the
context of a specific situation. Hopefully by using this
approach here rather than the INFORMATION_SCHEMA based tactics
shown in the simper examples I have conveyed the notion that
even with the recommended retrofit methodology, there remains a
high degree of flexibility to make the processing most
appropriate for a given retrofit effort and also that the
rollback script has the lowest demand for elegance as long as it
works when you need it.
The rollover script is a
stored procedure that will determine the name of the current
table and the other table, then atomically truncate the other
table, create the check constraints and alter the view or views
appropriately. The rollover script is where the Life
of the Party model deviates most from the other models.
Notice that we make an effort to use the sp_executesql
command rather than exec() since this procedure will used
regularly and will likely be done by an automated task- while I
chose not to type the extra few lines in the conversion
script and the rollback script because those scripts
will likely only be needed one time and I'll be there to see the
Query Analyzer output - really just a little laziness on my part
for sure. It's possible that table definitions will be larger
than the allowable nvarchar(4000) for sp_executesql in
which case exec() becomes the more attractive option.
Like the other rollover script examples, this script
executes the the dynamic command string in a transaction so that
the "holdlock,tablock" has the effect of stopping other writes
but allowing reads while the alter view happens. In
some cases you prefer to use "holdlock,tablockx" to prevent all
other access as an aid to creation of the check constraints.
Unlike either of the other examples, here the INFORMATION_SCHEMA
query to identify the "current" table includes a join to the
constraint_column_usage and check_constraint views resulting in
a rather lengthy join clause. Also unlike the other examples,
because this script has a number of distinct operations that can
be accurately defined only after some information is collected
that in some cases can require a few additional seconds, this
rollover script employs a technique of building all the
statements to be executed and storing them in a table variable
before the transaction is started that will enforce the "holdlock,tablock"
locking hint for the duration of the rollover processing. And
one final significant difference in this script is to define and
enforce a business rule that establishes the minimum amount of
data we must keep on line at all times (@CycleIntervalPeriod )
That provides all the components
we need to create and manage the updateable partitioning method.
You may wish to use the SQL Agent to initiate the nightly
rollover or you may want the application to be involved. This is
going to be driven by how much impact the rollover blocking has
on the application. Only good testing in your environment can
provide the ultimate answer to the question. The size of the
table being aged to the background, the quality of the index
used to identify rows that need to rolled over to the table
becoming current, the number of rows that will be rollover, and
the amount of contention between the check constraint validation
and the application will all have an impact on the duration of
this process.
The script concludes with a test
script to verify the conversion, rollover and rollback
processing. Note in these test cycles we'll also verify the
desired behavior that the rollover is not allowed if the result
would leave us with LESS than the desired duration of available
data.
If you follow the
Life of the Party script through the output you can
see the structure and names of the views and tables as well as
the behavior of the data in this partitioned design. Once you
get a real conversion to the point of successfully executing a
similar test plan in the unit testing environment, you'll be
ready to apply the conversion and a few rollovers in the
integrated testing/development environment to verify that the
conversion and rollover processing plays well with your
application(s).
For those tables where you are
removing a large chunk of rows on a regular basis and
experiencing unbearable contention between the application and
the row removal local partitioning strategies can be very useful
to tame the rough edges of an otherwise mainstream application.
It is also effective in many cases to use the partitioning
strategies we have considered in this series of articles for a
hierarchy of tables. As you might expect, the analysis and
testing is all the more intricate and necessary but the results
are equally as spectacular.
I welcome your questions or
suggestions on how you might approach a partitioning strategy
with a hierarchal data collection or on any thing else we've
covered in these articles on Retrofitting Table Delete
Archive Strategies.
Bill |