Retrofitting Table Level
Delete/Archive Strategies - Short Term Storage Models
by Bill
Wunder
When I began this series of articles
I had expected to deliver a one-two punch and then move on to
another aspect of SQL Server. Quickly it became obvious that
this topic would require more discussion than I had anticipated.
In the first segment,
Unbounded Table Growth by Design we explored the all too
common shortcut to data modeling that leaves an application with
unbounded table growth patterns in rapidly growing tables and we
examined how this practice is more debilitating the more
successful an application becomes as efforts to remove old and
no longer needed rows from table interferes with and perhaps
even breaks the applications ability to use the table. Then in
the second segment,
Classifications of Logging Tables we attempted to define
some loose categories that are useful as starting points for the
rescue of good applications from such unbounded growth
limitations. In that discussion we considered that there are a
few applications that actually need to support unlimited table
growth (recall we labeled this requirement as the History of
the World) and others that are completely miscast in the
role of an ever growing table when in fact they are merely
temporary data containers that can be easily and regularly
decapitated (with some irreverence we called these tables
Wholly Unnecessary). That was supposed to leave us to
discuss the majority of ever growing tables in this current
segment and to look over some working demonstrations of retrofit
partitioning strategies to help return the otherwise successful
application to a well behaved and easily maintainable state.
Well, now that I've had some time to ponder the real breadth of
remaining topic I believe I'm going to have to cut this last
piece of the pie into yet smaller slices in order to give the
remaining things to be said fair play.
Stick with me and I'll present some
working examples based on working solutions I have used along
the requirements (and complexity) continuum from the simple
partitioning strategy necessary for data that is slightly more
than Wholly Unnecessary but has only limited value after
a relatively short period of time (remember we dubbed this the
Saturday Night Live container) to the data table that is
rarely updated but might be read for a considerable period of
time (our Middle of the Road data) and onward into realm
of the table where rows are regularly read and updated for an
extended period of time but need to be held to some relatively
fixed size for performance and/or storage requirement reasons
(or as we named it, a Life of The Party table).
As a general practice it is useful
to approach any partitioning retrofit with a common methodology.
Whether working on a simple model or on the most stringent, we
want a conversion script that will morph the single table
model to the desired partitioned model, a rollback script
capable of returning the converted data model back into a single
table model, and a rollover script that can flip flop the
partitions as the table grows and the data ages. The conversions
scripts can be stored procedures or simple run once scripts. The
rollover script I strongly suggest be made a well documented
stored procedure so that the involved dependencies for the
partitioned table design remains visible to all developers. In
these examples I will follow the convention of creating a stored
procedure for each operation if for no other reason than to make
it easier for you to deploy the examples into a test environment
as you follow the discussions. In addition, each script will
create the common database used for all examples if it is not
found on the SQL Server.
For all scripts we will make good
use of the INFORMATION_SCHEMA views. If you have addition
questions concerning these views be sure to consult Books
Online.
For all of the examples we will use
the same relatively simple table structure with only the name
changed to keep it obvious which tables belong with which
example. We want to have adequate columns to demonstrate the
necessary techniques but probably not as many columns as you may
have to deal with in a real word retrofit of an unbounded table
growth design. I think it will be immeasurably useful if you can
find a test or development SQL Server 2000 to run the scripts as
we move along, though I will also present enough result sets
along the way so that you can see what is happening if for some
reason you cannot run the demonstration.
Please open the
Saturday Night Live
script and refer to the script as we walk
through it in the rest of this article.
The first order of business is to
create a database using a database definition that would be
useful to place the system catalog, the data, the indexes, and
the transaction log file each on a separate device but for the
test I'll be placing all three filegroups and the log in the
same location. When you set out to build a partitioning retrofit
you may be working with a production system where the devices
are on different physical hardware but in early development (aka
unit testing) you may find yourself working on the SQL Server
2000 Developer Edition installed on the single un-partitioned
drive of your workstation. I strongly suggest using this
approach of making everything look the same from inside SQL
Server in terms of where indexes, data and primary keys must be
placed to assure adequate conversion scripting detail
Looking at the create database command for moment, notice the
use of the full path to the [PRIMARY] file group as stored in
sysdatabases to derive the filename parameter for the other
files necessary. Feel free to modify the script as needed, just
make sure you have a DATA file group and an INDEXES file group
in addition to the PRIMARY filegroup or the examples will fail.
Next two roles are added in the
newly created LocalPartitionDemo database. One role will
server to document permissions granted to the application, the
other to a data analyst. As the complexity of the scheme
increases we'll notice that the interest of - and therefore the
permissions granted to - the application will include a wider
range of the partitioned dataset. Conversely, the data analyst
role will help us to justify keeping data "for internal use
only" that the application has no direct usage requirement. A
well partitioned data set - one with check constraint on each
table to help the optimizer with DML actions - is not necessary
if our only need for the older part of the partitioned set is
one off select queries. Later, in the Life of the Party
case a well partitioned data set is necessary if the application
must update all rows to a final state before they can be aged
out of the kept set. Users are not added to roles in the example
because we do not need to manipulate the users in the example
scripts. In the real world there are users in each role.
At this point a
an exact copy of the existing table is created in order to begin
developing and testing the conversion and rollover processes of
the
Saturday Night Live test
case. Recall that the objectives in this model are to keep only
a short recent frame of the data, perhaps an hour or a day or
week or whatever This is driven by the application requirements.
The reason we want this frame of data is so that we can provide
a row level granularity to the application for that prescribed
window and then complete an aggregation or archival operation on
the aged data before it becomes permanently uninteresting for
any purpose at the full detail granularity just after it is
rolled over to the unused table of the pair. Such a table might
be used to log orders or request or calls for the day or week
and then at the end of the day or week the events are moved into
a historical table, possibly at some level of aggregation. For
the rest of the current discussion lets make the interval a day
for ease of discussion. The historical table services all
queries concerning data older than that day. In the Saturday
Night Live model there may be an daily time slot when the
application is least busy ideally suited to archive and purge
the in use table. A typical delete operation can potentially
extended unavailability or unresponsiveness of the application
due to blocking contention on the logging table. To help this
situation, we create two copies of the the log table: copy A and
copy B. Then in that same time slot of low usage a quick swap
out of those tables is executes. The application is protected
from having to know which table is currently in use by replacing
the original table with a view having the same name as that
original table and using A and B suffixes on the two underlying
tables in the new design. It is an important aspect of data that
fits the Saturday Night Live model that the
data will never need to be modified (insert, update, and delete)
by the application once it moves outside of the established
"current" interval. Important because this eliminates the need
create an updatable partitioned view necessary to support such
operations. As we will see later in discussions of
Middle of the Road and Life of The Party
models, maintaining an updatable view can require considerably
more time and complexity during each switch between the A and B
table as the one to receive new rows. For this short term model
all we need to do is verify that yesterday's table - the one not
currently in use - is successfully archived, truncate it, and
alter the view to begin make that table the one underlying the
application's currently in use view. The impact to the
application should last only a second or two and the rollover
script is nice and easy.
The conversion
script in the Saturday Night Live script, as it will
in all the examples, makes the existing table become the "A"
version of the view shrouded table and then creates a clone of
that table for the "B" version. Notice the use of the exec
statement to create the views since the create statement must be
in it's own batch. You could also use sp_executesql.
The rollback script needs to
figure out which of the two twin tables is currently in use and
move it out from underneath the view. Notice how easily this is
accomplished by a quick check in the INFORMATION_SCHEMA.
The rollover script determines the name
of the current table and the other - not currently in use -
table, then truncates the other table and alter the view or
views appropriately as an atomic operation. This procedure could
also perform or initiate the aggregation or archival activity
using the new defined other table. If you are able to take care
of the remaining business with this data in this procedure there
is little reason to define any of the views save the one the
application will use that carries the original table's name.
Notice here we do use the sp_executesql command rather
than an exec() function since this procedure will be used
regularly and will likely be done by an automated task. I chose
not to type the extra few lines in the conversion script
and rollback script because those scripts will likely
only be needed one time and I'll be there to see the Query
Analyzer output - really just a little laziness on my part for
sure. Its also interesting that the dynamic command string runs
in a transaction so that the "holdlock,tablock" in has the
effect of stopping other writes but allowing reads while the
alter view happens.
That provides all the components
we need to create and manage this partitioning method. You may
wish to use the SQL Agent to initiate the nightly rollover or
you may want the application to be involved. This is going to be
driven by how much impact the rollover blocking has on the
application. Most likely in the Saturday Night Live scenario it
won't make much difference one way or the other because the
blocking period of the rollover should last no more than a
second or two in most cases.
Finally there is a unit test
script to verify all scripts and do a rollover as well as a
rollback. The results of the test script are included at the end
if the
Saturday Night Live
script.
If you follow the test script
through the output you can see the structure and names of the
views and tables as well as the behavior of the data in this
partitioned design. Once you get a real conversion to the point
of successfully executing a similar test plan in the unit
testing environment, you'll be ready to apply the conversion and
a few rollovers in the integrated testing/development
environment to verify that the conversion and rollover
processing plays well with your application(s).
For those tables where you are
removing a large chunk of rows on a regular basis and
experiencing unbearable contention between the application and
the row removal, this process can become a dear friend.
That brings us to the next level
of partitioning. First lets take a nice long break and give
everyone a chance to play with this example.
Hope to see you in a few days and
we'll look over a somewhat more complex approach to locally
partitioned retrofits.
Bill
|