Getting More From
the SQL Agent by
Bill Wunder
The SQL Agent provide some great
job automation capabilities. Out of the box with the SQL Agent
you get a nice amount of scheduling flexibility provided the
repeating nature of a schedule is simple and static. Likewise
there are intrinsic interfaces to ActiveX, the command line, and
T-SQL subsystems. For those circumstances where a dependency on
a desktop client application such as Outlook does not compromise
the stability of the SQL Server installation the SQL Agent
provides notification via SQLMail. It is even possible to to
centralize job control through the multi-server administration
configuration and tools set. In sum, I find the SQL Agent to be
a solid 80% solution for my needs. Really that says a lot of
good things about the SQL Agent. What I'd like to do in this
article is share some of the tactics I used for those 20% of my
needs when I fiddle a bit to get the desired behavior from a SQL
Agent scheduled job.
SQLMail-a-way
When something happens at the
Exchange server or when the Outlook client looses it's mind I
can't afford to have my job scheduling system - or worse, my SQL
Server - go down. For this reason alone I put a significant
amount of effort into making the xp_smtp_sendmail extended
stored procedure available for free download at
www.SQLDev.net work
asynchronously in all my environments for the SQL Agent and the
SQL Server services. I no longer even install a MAPI client on
my SQL Servers and I'm happy to tell you my server stability has
improved dramatically. I no longer have the occasional and
always inopportune occurrences of a hung SQL Server or SQL Agent
due to MAPI problems or the phenomenon of an important job being
stuck forever in a "performing completion tasks" state or the
inconvenience of needing to schedule a SQL Server outage while
an upgrade, service pack, or patch is applied to the Outlook
client and the machine is rebooted. If you'd like to see what
I've done to completely eliminate MAPI from my SQL Servers be
sure to check out my previous series of articles on the topic:
Take My SQLMail,
Please! - for a philosophical overview
From xp_sendmail to
safe_sendmail - to establish an asynchronous email delivery
environment
From xp_sendmail to
xp_smtp_sendmail - to make the switch from MAPI to the SMTP
extended stored procedure within SQL Server
From SQLMail to SMTP in
the SQLAgent - to make the switch from MAPI to the SMTP
extended stored procedure within SQL Agent
You'll find all the code I use to
make this work for me in the
sswug.org
Script
Library. Most of it in the scripts
Asynchronous SMTP alternative to SQLMail,
Run a job from all existing SQL Agent Alerts, and
meaningful Agent job notification,
still check out the articles for the full scoop and plenty of
additional code. To be sure, using an alternative email delivery
system requires an additional job step for each scheduled job
for either failure notification or completion notification,
however the configuration I propose in the articles is very
standard so in the end its about the same as having to go to the
Notification Tab to configure the built-in MAPI notifications.
Clear Cutting or Selective
Logging?
Whether you use MAPI or SMTP to
deliver SQL Agent failure messages it can be useful to archive
error messages someplace other than msdb.dbo.sysjobhistory.
With the built in job history processing it's pretty easy to
loose sight of why a job failed unless you raise the total
number of msdb.dbo.sysjobhistory rows to keep and the
number of msdb.dbo.sysjobhistory rows per job to keep to
very high values )configurable from the SQL Agent properties
dialog in Enterprise Manager). That means to keep track of a
failure on Friday night until you get in on Monday morning for a
job that runs once a minute on a SQL Agent subsystem with a few
hundred jobs where many other jobs run frequently you've got to
save thousands and thousands of "The job completed
successfully" msdb.dbo.sysjobhistory rows just to be able
to go back to that one failure message - often only to find that
the actual error was truncated from the message stream because
of the 255 character size of the column in
msdb.dbo.sysjobhistory. Even if you don't want to go all the
way and implement SMTP for your agent jobs you can still take
advantage of this alternative that saves only failure messages
to an archive. Refer to the script
admin db - meaningful Agent job notification
for some ideas on how to get started. Basically if
you change the reference to the safe_sendmail procedure to be a
reference to xp_sendmail this procedure will work for you in a
MAPI mail delivery environment.
Expanding this notion to another
level, you can also easily implement an escalating notification
paradigm. Suppose for the first hour that a job is failing you
want to be notified just so you can keep track of things. If the
condition that is generating the notifications persists beyond
an hour you want to notify the 24x7 operations staff and make
sure some corrective action is initiated. To achieve this
effect, it's pretty straightforward to modify the
admin db - meaningful Agent job notification
script to behave like the admin.dbo.NotifySLSLogShippingFailure
stored procedure included in the
admin db - Heavy Duty LiteSpeed Log Shipping
script.
The State of the Onion
There are many layers of concern
in so many of the processes that may need to be automated via
the SQL Agent. One good example in my shop is the need to
carefully watch log growth during index maintenance on databases
running in the "Full" recovery model. There is nothing in the
Agent that is going to let me monitor the used size of a log
device and make job control decisions based on that value. As
you know DBCC INDEXDEFRAG can consume lots of log disk space in
a relatively short period of time. If I don't do something to
manage this I can easily fill my log device and bring the server
to a grinding lurch. What I've done to monitor and manage the
growth issue is to spawn a job that does the DBCC INDEXDEFRAG
from the main indexing job and then put the original job in a
busy wait that repeats a loop to check the log space used
counter in master.dbo.sysperfinfo regularly and often for
a threshold size I've predefined. When the log usage hits the
threshold the controlling job stops the dynamically generated
defrag job and resumes the busy wait watching for the log to
fall below the usage threshold that will occur after the next
regularly scheduled log backup. Once the index defrag operation
has completed the spawned job is removed from the SQ: Agent and
the control job can go on about it's index maintenance business.
You can review the details of how I accomplish this in the
script
admin db - Index (fragmentation) Maintenance for
Log Shipping.
Stepping on Yourself
Sometimes its useful to start a
job from another process. Most often I see this capability used
when there is a relatively long running process that can be
completed asynchronously from a controlling process when
something first must happen - or more usually must complete in
the controlling process before the other task can begin. An
example might be a restore on the query box that cannot start
until the backup is done on the production box. All the
production backup really needs to do is tell the restore job on
the query server to start running. The rub comes if for any of
dozens of reason the restore is already active on the query
machine. Suppose for example that the restore needs to take care
of that index maintenance we were just speaking of and for some
reason the log backup job has been stopped. If in this case we
try to start the job on the query box from the production backup
process we are likely to get a message that the production
backup failed. If that happens at 2AM the 24x7 operations staff
is probably going to get somebody out of bed to see why the
backup failed. If it's someone else they call then no problem
(excuse my wicked laugh), but if it's going to be you wouldn't
you rather just check to see if the restore is already running
before you try to start it and do your best to eek out the full
5 available hours of sleep tonight?
I have couple of different ways
to tell if a job is in a state that is safe to start. One way
uses a
table
function to get the SQL Agent job status
using the OLE automation procedures. The other is a hack from
sp_help_job that uses the undocumented
xp_sqlagent_enum_jobs to
find out if an SQL Agent job can be started now.
It would be pretty easy to interchange the parts of the latter
to change the function to use xp_sqlagent_enum_jobs or
vice versa.
The Morning After
Consider the rather peculiar
circumstance I find myself in once a month. I reboot my SQL
Servers once a month to recover leaked memory in favor of the
practice of waiting until the server chokes to do the reboot.
It's just that much easier to get management and the SLAs
(Service Level Agreements) to go along with a scheduled reboot
than it is a Heimlich maneuver every now and again in the middle
of the production day. We have evolved this scheduled reboot to
happen on the first Saturday of the month at midnight. This time
the rub comes with the reality that the midnight reboot on the
first Saturday of the month can affect other jobs that run in
the minutes and hours after midnight on Sunday morning.
Depending on where the first Saturday of the month falls, the
next day can be the first or second Sunday of the month. That
means I want those jobs that run after midnight to run when
yesterday was the first Saturday and not run when yesterday was
not the first Saturday of the month. Maybe I'm missing
something, but I don't see that option in the SQL Agent's job
scheduler.
Actually our maintenance window
runs from midnight to 4AM so I added a job step to the beginning
of the jobs that I don't want to execute during the maintenance
window to help me make the right choice:
declare @dt datetime
set @dt = getdate()
-- job ran at regularly scheduled
time today so don't run again
if Not(datepart(weekday,@dt-1) = 7
-- yesterday was Saturday
and datepart(day,@dt-1) < 8) --
yesterday was in first week of month
And datepart(hour, @dt) = 4 -- now
4AM
raiserror( 'Request for delayed
(4AM) Sunday start when not Sunday morning after monthly
window aborted.',16,1)
-- job has to wait for the
maintenance window to expire before it can run today
if datepart(weekday,@dt-1) = 7 --
yesterday was Saturday
And datepart(day,@dt-1) < 8 --
yesterday was in first week of month
And datepart(hour, @dt) < 4 -- not
yet 4AM
raiserror( 'Request to run during
monthly maintenance window aborted.',16,1)
Using the raiserror allows me
to bail on the process using the SQL Agent flow control
functionality. I can alternately send a failure notification
email or just not run the job if a raiserror is triggered
from this code.
Spy -vs.- Spy
There must be a ton of other
scheduling gymnastics that people have dealt with when the SQL
Agent scheduler won't quite do it for you. Certainly I have a
few others that I haven't included mostly because a particular
requirement is so obscure or because a particular workaround is
less than elegant. Hopefully what I've conveyed in this article
is not the gamut of possibilities but rather the diversity of
possibilities. If you found something you can use in your
environment that's cool. If I helped you stretch your
imagination enough to see a possibility that will work for you
but is not discussed in this article or available in the off the
shelf SQL Agent even better!
Bill |