Heavy Duty LiteSpeed Log Shipping
Part 5
Application Failover Strategies
By
Bill Wunder
In most cases, the reward for a
nicely running log shipping configuration is a “warm spare”. A
SQL Server that is ready to take over for the primary node at a
moments notice. Too bad this isn’t as easy to do as it is to
say. Log shipping moves the database from one SQL Server
location to another. It is as important that the server level
configuration of the standby SQL Server “mirrors” the primary
node and it is crucial that the application is well considered
long before the moment of truth arrives.
Before the Failover
At the SQL Server configuration
level you’ll want to be certain that the hardware of the standby
can support the application load, that the software versions and
patch levels are maintained in lock step with the primary
database server, and that the logins, linked servers, and
scheduled jobs on the standby represent the most current state
of the primary instance at all times. And of course you’ll want
to test the failover process/procedure as well as proper
operation of the standby server when in primary mode regularly
if you want everything to actually work when you need it.
I would recommend that the starting
point for hardware should be an exact copy of the primary node.
Before you settle for anything other than a clone in this
situation, make sure you understand and can live with any
differences.
SQL Logins you can copy with a DTS
wizard. Beware of SQL Logins where the SID will not properly map
when DTS is used to transfer SQL Logins. If you use SQL
Authentication I would recommend that instead you use build
queries that will build the sp_addlogin statements capturing the
SIDs from the primary node to move logins. SQL Another
possibility would be to use a SQLDMO script generator that
captures the SID such as my DDL Archive Utility.
to generate the scripts necessary to transfer the logins in a
way that they will be usable in the event of a failover. With
Linked Servers you don’t have as many choices.
SQL Jobs you can also copy with a
DTS wizard. Make sure you have the agent shut off on the standby
server if you use this method else you could have a little
unwanted system activity depending on what you SQL Jobs do for
you. Naturally, I would suggest my Archive Utility as another
alternative. My preferred method is a custom ActiveX DTS package
that uses SQLDMO that I run from the Agent on the primary
server. The main advantages of this package are you can disable
the jobs when you copy them and easily customize the copy rules
to suit your needs. I copy jobs nightly, but only if the job on
the standby is not active. This allows me to run such things as
backup jobs and maintenance jobs that are different on the two
servers yet remain certain that all application jobs are going
to be there on the standby when I need them. You can check out
the
ActiveX script for this job.
Linked Servers are a whole different
problem. There is no DTS wizard that will copy linked servers.
DMO does not have a script method for linked Severs. You can
extract most data you need to recreate the primary servers
linked server configuration from the system tables. My preferred
method is again
Bill Wunder’s DDL Archive Utility
generated scripts that will provide a script to exactly
reproduce the primary server linked servers on the standby.
At the Fail
There are two basic choices for how
to fail over from the source SQL Server to the destination SQL
Server when the time comes. You can take the source SQL Server
off line and rename the destination SQL server to assume the
source SQL Server name or you can re-point all applications and
services to the destination server. In my opinion re-pointing is
by far the better choice. However, if you don’t have access to
all applications and services that use the data, renaming the
SQL Server may be your only option. I’ve had some good luck with
automating the steps necessary to re-point the applications and
services. I’ve never even tried to automate a server rename. I
Suppose it could be done, but wouldn’t expect it to be reliable
because the times I’ve manually gone through the process there
was always a new gotcha. Changes are necessary in the domain, in
DNS, and in SQL Server. Outlook doesn’t like it when you rename
a server for those that haven’t yet made the
switch to SMTP.
Don’t think you’ll be able to rename from a default instance to
a named instance.
On the other hand an application
re-point provides much more flexibility. Depending on the
complexity of a re-point procedure, you may choose to use a
manual procedure or you may prefer to automate the steps. Seems
to me that automating the steps when failover is fairly
straightforward should be simple so why would you want to rely
on a manual process at 3AM? And as the procedure gets more
complex, the need to automate becomes more compelling. In our
shop we use an automated process that is driven from an intranet
internet. Even the 24x7 staff can manage the failover. No need
for a DBA and a domain admin to be on site lie there is for a
SQL Server rename operation.
To re-point, you’ll need to update
any registry entries, application ini files, ODBC DSNs, OLEDB
UDLs, hard coded connection strings, database tables, linked
servers and other references that tell an application or service
the server name where a database is located from the source SQL
Server name to the destination SQL Server name. Depending on how
you have written your applications and services, you may need to
also restart them after the references have been changed.
Consider enhancing your applications and services so that the
SQL Server name can be changed without a restart. This will
speed and simplify failover tremendously. Obviously the
challenges are greater the more and varied are the types of
references used to make connections to the SQL Server.
I’ll leave you with my check list
for a manual standby server rename. If this doesn’t convince you
that the application repoint is a better way, nothing will.
You’ll notice I had to create several special SQL Agent Jobs to
“simplify” the process. I won’t provide the details of those
jobs, but the names should give you a good indication of what
they do. I should also admit that this procedure is written for
native log shipping rather than SQL LiteSpeed log shipping so a
few things are changed. (On the off chance you are curious, just
drop me an email and I’ll tell you more about any of the special
jobs or changes you may want to consider to use this check list
with HeavyDuty SQL LiteSpeed Log Shipping.)
Failover Procedure to rename a
log shipping destination (standby) SQL server machine to assume
the identity of the failed source (primary) SQL Server machine.
(1) If Possible, script all Alerts,
Operators, and Jobs on the source SQL Server
(2) If Possible, backup all
databases on the source SQL Server.
(3) If (2) was done, restore all
user database to the destination SQL Server except database
admin
Do not
restore the system database: master, model, msdb, and tempdb
(4) Restore the last log backup from
the source SQL Server that will be moved to the standby SQL
Server into each logged shipped database on the destination SQL
Server:
Manually run the "Transaction Log
Backup Job for DB Maintenance
Plan 'Transaction Log Backup'" on
the source SQL Server
Disable the scheduled job
"Transaction Log Backup Job for DB Maintenance
Plan 'Transaction Log Backup'" on
the source SQL Server
Manually run the "Copy Job For Log
Shipping" on the destination SQL Server
Verify that all files in the log
shipping staging folder G:\tlog_backups\ on the source SQL
Server are also in G:\tlog_backups
on the destination SQL
Server (Skip this step if you are using HeavyDuty SQL LiteSpeed
Log Shipping. It does not
create the redundant
backup file like native dbmaint log shipping does.)
--- IF the
source SQL Server FAILS AND CANNOT BE ACCESSED BEGIN HERE ---
(4a)
Disable the job "Copy Job For Log
Shipping" on the destination SQL Server
Manually execute the "Load Job For
Log Shipping" on the destination SQL Server
Disable the job "Load Job For Log
Shipping" on the destination SQL Server
(5) If Possible, Shut down the
failed source SQL Server. Disconnect this server from the
network by unplugging the network interface cables or by
disabling all network interface cards (and restart the server to
prevent the hard drives from seizing).
(6) Rename the destination SQL
Server to the source SQL Server and reboot this server
-The person doing this rename must
be a domain admin since this procedure will most likely require
adding the computer account to the domain.
-It is easiest to shutdown the
source SQL Server and use its IP address on the destination SQL
Server when failing over. This way, DNS does not have to be
updated. The IP address for the source SQL Server can be
quickly determined by pinging it prior to the rename.
-The steps for renaming the
destination SQL Server to the source SQL Server.
-
Log on to the destination SQL
Server using a domain admin account
-
Set the MSSQLSERVER service to
start manually
-
Shutdown and power off the
source SQL Server --- should have been done at step (4)
-
Open up Server Manager (you must
be logged into a machine using a domain admin account),
confirm you are in the correct domain.
-
Find the domain account for the
source SQL Server and delete it.
-
Highlight the computer running
as Primary Domain controller.
-
Synchronize the entire domain
(you're now done with server manager)
-
Change the IP address of
the destination SQL Server to what ever the Source had been.
-
Remove the standby SQL Server
from the domain (specify WORKGROUP for the workgroup name)
-
Rename the destination SQL
Server to the source SQL Server.
-
Reboot
-
Add the new the source SQL
Server to the domain
-
Reboot
-
Set the MSSQLSERVER service to
start automatically
-
Start the MSSQLSERVER service
-
Start the SQLSERVERAGENT service
-
If you still use Outlook, from
the RUN command prompt run "OUTLOOK.EXE
/checkclient"
-
Reboot
(7) When the server comes up as the
source SQL Server, verify that SQL Server is running (green
light in tool tray icon) and execute the following commands in
Query Analyzer while logged in under an account with SQL Server
System Administrator level permissions (sa, any Domain Admin
account or any other account that is in the local Administrators
group on the machine or in the System Administrators role on the
SQL Server):
exec
sp_dropserver 'the
destination SQL Server'
exec
sp_addserver 'the
source SQL Server', 'local'
(8) Stop and start the SQL Server
service using Enterprise Manager (rather than from the Services
control console if possible). Two services will stop when the
SQL Server Service (MSSQLSERVER) is stopped. The SQL Server
Agent Service (SQLSERVREAGENT) must be manually restarted after
the SQL Server service is up and running. Verify that "select
@@servername" executed in Query Analyzer returns the source SQL
Server.
(9)If you were not able to recover
the destination SQL Server database by using when the last log
was shipped, after the SQL Server has successfully restarted,
complete the following actions using Query Analyzer while logged
in under an account with SQL Server System Administrator level
permissions. In most cases this can be sa or any domain account
that is in the local Administrators group on the machine or in
the System Administrators role on the SQL Server. It really
depends on how you’ve configured the security on the SQL Server
box.:
RESTORE
DATABASE ‘databasename’ with RECOVERY
(10) If there are any other special
procedures necessary when the source SQL Server is restarted,
complete those steps now. Examples might be restarting machines,
applications and/or services that wig out when you take the SQL
Server out from under them.
(11) If you were not able to copy
the SQL Agent Jobs as enabled, run the job: "w enable jobs that
were active in production"
(12) if (1) was done, stop the SQL
Server Agent using Enterprise Manager and execute the scripts
generated in (1) and Restart the SQL Server Agent from
Enterprise Manager.
That’s it. Trust me, it’s a blast.
Especially at DBA prime time (i.e. 3AM)
If you haven’t already, you can
download the Heavy Duty LiteSpeed
Log Shipping stored procedures
and save them all to a location that you have permission to
access from the network using the file names indicated in the
download script for each procedure’s script. Then follow the
steps outlined in
Heavy Duty LiteSpeed Log Shipping
Part 2 for deploying the
log shipping job on a test server.
Discover the usage and processing
actions of a deployed log shipping installation in the article
Heavy Duty LiteSpeed Log Shipping
Part 3. Then if you have
surging log growth conditions you can use the information
presented in
Heavy Duty Log Shipping Part 4
to help you manage log growth.
So there you have it, the 5 week
abbreviated discussion of Heavy Duty SQL LiteSpeed Log Shipping.
Hope you found it all interesting and useful. Perhaps you’re
ready to put it to the real test. If you do, let me know how it
goes.
Bill
|