SQL Darwinism: On SQL
Server Baselines, Metrics Collection and Trend Analysis
By Bill Wunder
Ernst Mayr will be 100 this year. Professor of
Zoology Emeritus at Harvard University, Mayr is sometimes
referred to as the Dean of Evolution and has long been a
recognized expert in evolutionary biological science. Mayr
completed an interesting dissection of Darwin's theories from
The Origin of Species in the 1991 book, One Long
Argument. In this book the Professor masterfully describes
components of Darwin's evolution. With all due respect to
Professor Mayr's scientific knowledge, I'm no biologist so I
can't really speak to the nuances of his scientific divergence
with Darwin - though in my understanding the differences are
significant to the highly trained mind. I do find it
particularly insightful to borrow the good Professor's analysis
of Darwin's basic premise as a way to understand and explain the
change inherent in database applications. With my apologies for
any misrepresentations or obfuscations I have taken the liberty
of doing a little word substitution into Mayr's itemized account
of Darwin's evolutionary theory as it might describe a typical
database application on this, the threshold of Mayr's second
century. I have also added a few software technology keywords to
try to help you conjure up an image of how the aspects of Mayr's
critique of Darwinism offers a plausible analytical value in the
world of database applications:
- Evolution as
such. This is the theory that the data application is
not constant or recently created nor perpetually cycling,
but rather is steadily changing, and that data and structure
are transformed in time. keywords: software lifecycle,
iterative development, service packs, patches, bugs, scope
creep
- Common descent.
This is the theory that every set descended from a common
object, and that all groups of objects, including methods,
constants, and even relational tables, ultimately go back to
a single binary origin. keywords: the SQL Server 2000 model
database, data modeling, reusability, OOP, the wheel
(obscure), hierarchical, relational, star schema
- Multiplication
of species. This theory explains the origin of the
enormous data diversity. It postulates that data multiply,
either by splitting into daughter sets or by "budding", that
is, by the establishment of application specific isolated
founder populations that evolve into new sets. keywords:
project management, normalization, refactoring,
backup/restore, CREATE DATABASE, EDI, XML
- Gradualism.
According to this theory, evolutionary change takes place
through the gradual change of set populations and not by the
sudden (saltational) production of new individuals that
represent a new type. keywords: insert, update, delete,
legacy components, backward compatibility, change control,
versioning
- Natural selection.
According to this theory, evolutionary change comes about
through the abundant production of attribute variation in
every generation. The relatively few individuals who
survive, owing to a particularly well-adapted combination of
inheritable characters, give rise to the next generation.
(keywords delete/archive strategy, outsourcing, Internet
bust, dBase, MVS, Microsoft, Oracle, MySQL)
Hopefully you found the ease with which
biological evolution and database technology can be similarly
described palatable and at least a little amusing. At another
level I believe it is also similarly interesting to explore the
advantages Mayr has at his disposal in critiquing Darwin's
postulations. From the time Sir Charles sailed around South
America to the time Ernst Mayr spent his scholarly career
exploring the validity of Darwin's pioneering work a tremendous
amount of data had been accumulated. Observations and essays to
compare and contrast to the original ideas are now plentiful and
represent a huge spectrum of empirical thought. Sure, Darwin had
some guidance and others before him had offered bits and pieces
of information that he could build upon, but Mayr had much more.
He had a clearly defined starting point for the definition of
biological evolution and he had a rather extended trend of
scientific thought upon which to build a clearer, more accurate
description of biological evolution. Simply put, Mayr had a
baseline, a collection of data points from which to clearly
state how well the original theory performed in the moment and
even to predict how the theory might change in the future, and
most importantly he followed a sound methodology to reach sound
conclusions.
Likewise, in order to gauge the performance of
original specifications for a data application - and we all know
how theoretical those original specifications can be - one needs
first to establish baselines. In too many cases the baseline is
left anecdotal with the result that proactive and predictive
analysis is pre-empted by crisis and the path forward is left
open to less than well thought out adjustments to the original
theory. Just consider how far down a narrow path Darwinism
progressed before it received the necessary analysis and
adjustments to extract the facts from the fictions - to wit
Herbert Spencer's
Social Darwinism
hogwash. And not until late in the 20th century were adequate
corrections advanced for those cogent errors of the late 19th
century. So it is with database components of an application
that a baseline coupled with continual analysis and adjustment
is key to assuring continual and accurate suitability of the
hardware and software components to the application.
To say that in a different way, there are
really two general approaches used to determine when to upgrade
or re-architect a SQL Server. I'm not saying that there are only
two ways to upgrade. Rather the point I want to make - and
expound upon at some length in my next article by the way - is
that what causes you to decide to upgrade or fundamentally alter
an application database server is either a measured response in
a systems evolution or a reaction to a crisis or assumption at a
critical point. A SQL Server critical point occurs when hardware
or design limitations result in unsatisfactory performance or
unacceptable availability. Perhaps the total CPU hovers between
80 and 100% independent of a single run-away process or disk
queuing on your storage system results in general customer
dissatisfaction in application response time or you run out of
tricks to prevent your log device from filling with a resultant
interruption of production processing on a regular basis, or you
flat run out of disk space for your data set or processor
bandwidth for the application load. Conversely, the evolution of
a database application is generally gradual and to a large
extent predictable. It is common to measure web hits trend up or
down or the count of new customers to increase by n number a
month. The challenge - from the database capacity and
performance planning perspective - is how to equate such high
level application growth rates with viable database server load
expectations. When well done, a SQL Server critical point can be
avoided while at the same time the factual data can be leveraged
to result in an efficient and effective upgrade path rather than
a potential for SQL Darwinism.
Allow me to facetiously illustrate where such
SQL Darwinism might take an application. First, in keeping to an
example consistent with Spencer's imperative that it is our
moral duty to allow the weak to fail undeterred, lets say that
SQL Darwinism holds that an application be slated for a new
platform "when it outgrows" SQL Server (bet you never heard that
one before). Then it would follow that only ANSI SQL be used in
that application in spite of the well known truth that ANSI SQL
does not assure an easy upgrade path to or from any existing
database product. And it could even be that a follower of this
SQL Darwinism would be resistant to remedial changes when a
growing application demonstrates barely adequate performance on
an earlier and/or aging version of a database product, OS, or
hardware platform - since implicit in this Darwinism is the
expectation that the application will outgrow the platform
regardless. So finally the day comes when a poor indexing scheme
causes the SQL Server to no longer keep up with the application
needs. Fortunately, and predictably in the grandest design of
SQL Darwinism, there is almost always an unscrupulous entity -
lets call ours "N Ron Consultants" - waiting in the wings to
move the struggling application to a new platform at some hefty
fee with high associated costs and, if the Darwinist is at all
lucky, quietly slip in the necessary indexing corrections as
they rewrite the ANSI SQL for the new platform.
Maybe that little story is a bit far fetched,
maybe not. What is most definitely not far fetched is that it's
not that difficult to establish baselines and watch the trends
on a database server as part of a methodical approach to keeping
a successful application successful. In my next article I'm
going to explain how you can make use of the SQL Server 2000
sysperfinfo system table and the global statistics available
within SQL Server to easily and automatically establish
baselines on your servers and to continuously and automatically
track changes from those baselines to help you avoid the
pitfalls of SQL Darwinism - even if your application is
successful.
Bill
|