Heavy Duty LiteSpeed Log Shipping Part 5

Application Failover Strategies

In most cases, the reward for a nicely running log shipping configuration is a “warm spare”. A SQL Server that is ready to take over for the primary node at a moments notice. Too bad this isn’t as easy to do as it is to say. Log shipping moves the database from one SQL Server location to another. It is as important that the server level configuration of the standby SQL Server “mirrors” the primary node and it is crucial that the application is well considered long before the moment of truth arrives.

Before the Failover

At the SQL Server configuration level you’ll want to be certain that the hardware of the standby can support the application load, that the software versions and patch levels are maintained in lock step with the primary database server, and that the logins, linked servers, and scheduled jobs on the standby represent the most current state of the primary instance at all times. And of course you’ll want to test the failover process/procedure as well as proper operation of the standby server when in primary mode regularly if you want everything to actually work when you need it.

I would recommend that the starting point for hardware should be an exact copy of the primary node. Before you settle for anything other than a clone in this situation, make sure you understand and can live with any differences.

SQL Logins you can copy with a DTS wizard. Beware of SQL Logins where the SID will not properly map when DTS is used to transfer SQL Logins. If you use SQL Authentication I would recommend that instead you use build queries that will build the sp_addlogin statements capturing the SIDs from the primary node to move logins. SQL Another possibility would be to use a SQLDMO script generator that captures the SID such as my DDL Archive Utility. to generate the scripts necessary to transfer the logins in a way that they will be usable in the event of a failover. With Linked Servers you don’t have as many choices.

SQL Jobs you can also copy with a DTS wizard. Make sure you have the agent shut off on the standby server if you use this method else you could have a little unwanted system activity depending on what you SQL Jobs do for you. Naturally, I would suggest my Archive Utility as another alternative. My preferred method is a custom ActiveX DTS package that uses SQLDMO that I run from the Agent on the primary server. The main advantages of this package are you can disable the jobs when you copy them and easily customize the copy rules to suit your needs. I copy jobs nightly, but only if the job on the standby is not active. This allows me to run such things as backup jobs and maintenance jobs that are different on the two servers yet remain certain that all application jobs are going to be there on the standby when I need them. You can check out the ActiveX script for this job.

Linked Servers are a whole different problem. There is no DTS wizard that will copy linked servers. DMO does not have a script method for linked Severs. You can extract most data you need to recreate the primary servers linked server configuration from the system tables. My preferred method is again Bill Wunder’s DDL Archive Utility generated scripts that will provide a script to exactly reproduce the primary server linked servers on the standby.

At the Fail

There are two basic choices for how to fail over from the source SQL Server to the destination SQL Server when the time comes. You can take the source SQL Server off line and rename the destination SQL server to assume the source SQL Server name or you can re-point all applications and services to the destination server. In my opinion re-pointing is by far the better choice. However, if you don’t have access to all applications and services that use the data, renaming the SQL Server may be your only option. I’ve had some good luck with automating the steps necessary to re-point the applications and services. I’ve never even tried to automate a server rename. I Suppose it could be done, but wouldn’t expect it to be reliable because the times I’ve manually gone through the process there was always a new gotcha. Changes are necessary in the domain, in DNS, and in SQL Server. Outlook doesn’t like it when you rename a server for those that haven’t yet made the switch to SMTP. Don’t think you’ll be able to rename from a default instance to a named instance.

On the other hand an application re-point provides much more flexibility. Depending on the complexity of a re-point procedure, you may choose to use a manual procedure or you may prefer to automate the steps. Seems to me that automating the steps when failover is fairly straightforward should be simple so why would you want to rely on a manual process at 3AM? And as the procedure gets more complex, the need to automate becomes more compelling. In our shop we use an automated process that is driven from an intranet internet. Even the 24x7 staff can manage the failover. No need for a DBA and a domain admin to be on site lie there is for a SQL Server rename operation.

To re-point, you’ll need to update any registry entries, application ini files, ODBC DSNs, OLEDB UDLs, hard coded connection strings, database tables, linked servers and other references that tell an application or service the server name where a database is located from the source SQL Server name to the destination SQL Server name. Depending on how you have written your applications and services, you may need to also restart them after the references have been changed. Consider enhancing your applications and services so that the SQL Server name can be changed without a restart. This will speed and simplify failover tremendously. Obviously the challenges are greater the more and varied are the types of references used to make connections to the SQL Server.

I’ll leave you with my check list for a manual standby server rename. If this doesn’t convince you that the application repoint is a better way, nothing will. You’ll notice I had to create several special SQL Agent Jobs to “simplify” the process. I won’t provide the details of those jobs, but the names should give you a good indication of what they do. I should also admit that this procedure is written for native log shipping rather than SQL LiteSpeed log shipping so a few things are changed. (On the off chance you are curious, just drop me an email and I’ll tell you more about any of the special jobs or changes you may want to consider to use this check list with HeavyDuty SQL LiteSpeed Log Shipping.)

Failover Procedure to rename a log shipping destination (standby) SQL server machine to assume the identity of the failed source (primary) SQL Server machine.

(1) If Possible, script all Alerts, Operators, and Jobs on the source SQL Server

(2) If Possible, backup all databases on the source SQL Server.

(3) If (2) was done, restore all user database to the destination SQL Server except database admin

Do not restore the system database: master, model, msdb, and tempdb

(4) Restore the last log backup from the source SQL Server that will be moved to the standby SQL Server into each logged shipped database on the destination SQL Server:

Manually run the "Transaction Log Backup Job for DB Maintenance

Plan 'Transaction Log Backup'" on the source SQL Server

Disable the scheduled job "Transaction Log Backup Job for DB Maintenance

Plan 'Transaction Log Backup'" on the source SQL Server

Manually run the "Copy Job For Log Shipping" on the destination SQL Server

Verify that all files in the log shipping staging folder G:\tlog_backups\ on the source SQL Server are also in G:\tlog_backups

on the destination SQL Server (Skip this step if you are using HeavyDuty SQL LiteSpeed Log Shipping. It does not

create the redundant backup file like native dbmaint log shipping does.)

--- IF the source SQL Server FAILS AND CANNOT BE ACCESSED BEGIN HERE ---

(4a)

Disable the job "Copy Job For Log Shipping" on the destination SQL Server

Manually execute the "Load Job For Log Shipping" on the destination SQL Server

Disable the job "Load Job For Log Shipping" on the destination SQL Server

(5) If Possible, Shut down the failed source SQL Server. Disconnect this server from the network by unplugging the network interface cables or by disabling all network interface cards (and restart the server to prevent the hard drives from seizing).

(6) Rename the destination SQL Server to the source SQL Server and reboot this server

-The person doing this rename must be a domain admin since this procedure will most likely require adding the computer account to the domain.

-It is easiest to shutdown the source SQL Server and use its IP address on the destination SQL Server when failing over. This way, DNS does not have to be updated. The IP address for the source SQL Server can be quickly determined by pinging it prior to the rename.

-The steps for renaming the destination SQL Server to the source SQL Server.

Log on to the destination SQL Server using a domain admin account
Set the MSSQLSERVER service to start manually
Shutdown and power off the source SQL Server --- should have been done at step (4)
Open up Server Manager (you must be logged into a machine using a domain admin account), confirm you are in the correct domain.
Find the domain account for the source SQL Server and delete it.
Highlight the computer running as Primary Domain controller.
Synchronize the entire domain (you're now done with server manager)
Change the IP address of the destination SQL Server to what ever the Source had been.
Remove the standby SQL Server from the domain (specify WORKGROUP for the workgroup name)
Rename the destination SQL Server to the source SQL Server.
Reboot
Add the new the source SQL Server to the domain
Reboot
Set the MSSQLSERVER service to start automatically
Start the MSSQLSERVER service
Start the SQLSERVERAGENT service
If you still use Outlook, from the RUN command prompt run "OUTLOOK.EXE /checkclient"
Reboot

(7) When the server comes up as the source SQL Server, verify that SQL Server is running (green light in tool tray icon) and execute the following commands in Query Analyzer while logged in under an account with SQL Server System Administrator level permissions (sa, any Domain Admin account or any other account that is in the local Administrators group on the machine or in the System Administrators role on the SQL Server):

exec sp_dropserver 'the destination SQL Server'

exec sp_addserver 'the source SQL Server', 'local'

(8) Stop and start the SQL Server service using Enterprise Manager (rather than from the Services control console if possible). Two services will stop when the SQL Server Service (MSSQLSERVER) is stopped. The SQL Server Agent Service (SQLSERVREAGENT) must be manually restarted after the SQL Server service is up and running. Verify that "select @@servername" executed in Query Analyzer returns the source SQL Server.

(9)If you were not able to recover the destination SQL Server database by using when the last log was shipped, after the SQL Server has successfully restarted, complete the following actions using Query Analyzer while logged in under an account with SQL Server System Administrator level permissions. In most cases this can be sa or any domain account that is in the local Administrators group on the machine or in the System Administrators role on the SQL Server. It really depends on how you’ve configured the security on the SQL Server box.:

RESTORE DATABASE ‘databasename’ with RECOVERY

(10) If there are any other special procedures necessary when the source SQL Server is restarted, complete those steps now. Examples might be restarting machines, applications and/or services that wig out when you take the SQL Server out from under them.

(11) If you were not able to copy the SQL Agent Jobs as enabled, run the job: "w enable jobs that were active in production"

(12) if (1) was done, stop the SQL Server Agent using Enterprise Manager and execute the scripts generated in (1) and Restart the SQL Server Agent from Enterprise Manager.

That’s it. Trust me, it’s a blast. Especially at DBA prime time (i.e. 3AM)

If you haven’t already, you can download the Heavy Duty LiteSpeed Log Shipping stored procedures and save them all to a location that you have permission to access from the network using the file names indicated in the download script for each procedure’s script. Then follow the steps outlined in Heavy Duty LiteSpeed Log Shipping Part 2 for deploying the log shipping job on a test server.

Discover the usage and processing actions of a deployed log shipping installation in the article Heavy Duty LiteSpeed Log Shipping Part 3. Then if you have surging log growth conditions you can use the information presented in Heavy Duty Log Shipping Part 4 to help you manage log growth.

So there you have it, the 5 week abbreviated discussion of Heavy Duty SQL LiteSpeed Log Shipping. Hope you found it all interesting and useful. Perhaps you’re ready to put it to the real test. If you do, let me know how it goes.

Bill