Viscosity's Blog

Create a Fast-Start Failover Observer Service on Linux

Written by Sean Scott | Feb 2, 2023 2:50:31 PM

Oracle Data Guard protects critical database environments with exact, physical standbys, or copies, of production databases. Business continuity and disaster recovery are a matter of switching or failing over to the standby database when the primary system goes offline.

 

Fast-Start Failover (FSFO) automates these activities. FSFO monitors participants in Data Guard configurations and, when it detects that the primary is unavailable, performs the switch automatically. It’s more responsive and durable than manual intervention. The time it takes for a DBA to acknowledge a page, log in to the environment, assess the situation, and make the decision to switch to the standby represents lost revenue and productivity. As an automated solution, FSFO is more efficient for dealing with multiple databases, too.

 

The Observer is the critical component in a FSFO solution. It’s responsible for monitoring the environment, detecting events, and triggering a switch. The Observer is really just a Data Guard Broker client session. It connects to the Data Guard topology and reads the status. Ideally, the Observer (or, better still, multiple Observers) is on a dedicated machine, and not located on the primary or standby database host.

 

A typical Observer setup involves running a script that starts a Data Guard Broker client session, connects to the primary database, and starts the observer. The process runs in the background (and in older versions never returned control). But the Observer is monitoring mission-critical databases, and there should be some intelligence built around it. A lot could go wrong with the Observer. How do you know if it’s still running properly? If it stops, what’s the mechanism for restarting it?

So, we often see scripts with some diagnostic capabilities, called by a cron job, checking the health and activity of the Observer.

 

This functionality is already built into the systemd process on Linux systems. cron runs on top of systemd, and its service architecture has embedded restart features, so why not cut out the middleman and just run the Observer process as a service?

 

Configure an Observer Host

 

I’ll demonstrate this setup on Oracle Cloud Infrastructure, using an Always-Free eligible compute instance running Oracle Enterprise Linux 8. The Observer will run from an Oracle 19c Database Client home but might just as easily use a full Oracle Database installation. And, while I’m using a 19c client, the examples here will also work with older versions (looking at you, 11g) built before the “fire-and-forget” start observer in background was added!

 

The first step after provisioning the VM is preparing the environment and installing the software. I used the preinstall RPM for this, just as I would for a database installation:

yum -y install oracle-database-preinstall-19c

[root@observer ~]# yum -y install oracle-database-preinstall-19c
Ksplice for Oracle Linux 8 (x86_64) 6.5 MB/s | 1.7 MB 00:00
MySQL 8.0 for Oracle Linux 8 (x86_64) 11 MB/s | 2.8 MB 00:00
MySQL 8.0 Tools Community for Oracle Linux 8 (x86_64) 2.2 MB/s | 426 kB 00:00
MySQL 8.0 Connectors Community for Oracle Linux 8 (x86_64) 238 kB/s | 28 kB 00:00
Oracle Software for OCI users on Oracle Linux 8 (x86_64) 35 MB/s | 63 MB 00:01
Oracle Linux 8 BaseOS Latest (x86_64) 18 MB/s | 54 MB 00:03
Oracle Linux 8 Application Stream (x86_64) 24 MB/s | 43 MB 00:01
Oracle Linux 8 Addons (x86_64) 4.7 MB/s | 5.6 MB 00:01
Latest Unbreakable Enterprise Kernel Release 6 for Oracle Linux 8 (x86_64) 9.5 MB/s | 61 MB 00:06
Dependencies resolved.
...
Installed:
ksh-20120801-257.0.1.el8.x86_64 libICE-1.0.9-15.el8.x86_64 libSM-1.2.3-1.el8.x86_64 libX11-xcb-1.6.8-5.el8.x86_64
libXcomposite-0.4.4-14.el8.x86_64 libXi-1.7.10-1.el8.x86_64 libXinerama-1.1.4-1.el8.x86_64 libXmu-1.1.3-1.el8.x86_64
libXrandr-1.5.2-1.el8.x86_64 libXt-1.1.5-12.el8.x86_64 libXtst-1.2.3-7.el8.x86_64 libXv-1.0.11-7.el8.x86_64
libXxf86dga-1.1.5-1.el8.x86_64 libXxf86misc-1.0.4-1.el8.x86_64 libXxf86vm-1.1.4-9.el8.x86_64 libaio-devel-0.3.112-1.el8.x86_64
libdmx-1.1.4-3.el8.x86_64 libnsl-2.28-189.5.0.1.el8_6.x86_64 libstdc++-devel-8.5.0-10.1.0.1.el8_6.x86_64 oracle-database-preinstall-19c-1.0-2.el8.x86_64
xorg-x11-utils-7.5-28.el8.x86_64 xorg-x11-xauth-1:1.0.9-12.el8.x86_64

Complete!
[root@observer ~]#

Next, I create directories for the Oracle software:

mkdir -p /opt/
chown -R oracle:oinstall /opt/

Client Software Installation

After switching to the oracle user, I prepared the environment and expanded the installation archive to the ORACLE_HOME:

export ORACLE_BASE=/opt/oracle
export ORACLE_INVENTORY=/opt/oraInventory
export ORACLE_HOME=$ORACLE_BASE/product/19c/client_1
unzip -oq -d $ORACLE_HOME ~/LINUX.X64_193000_client_home.zip

Installing 19c client software on Oracle Enterprise Linux 8 (and particularly on OCI) usually produces some errors:

[WARNING] [INS-08101] Unexpected error while executing the action at state: 'clientSupportedOSCheck'
CAUSE: No additional information available.
ACTION: Contact Oracle Support Services or refer to the software manual.
SUMMARY:
- java.lang.NullPointerException

The following workaround—overriding the distribution ID and temporary directories—bypasses the problem:

export CV_ASSUME_DISTID=OL7
mkdir -p /opt/oracle/tmp
export TMP=/opt/oracle/tmp
export TEMP=/opt/oracle/tmp
export TMPDIR=/opt/oracle/tmp

I also created a response file so I could run the installer package in the background:

cat << EOF > ~/observer.rsp
oracle.install.responseFileVersion=/oracle/install/rspfmt_clientinstall_response_schema_v19.0.0
UNIX_GROUP_NAME=oinstall
INVENTORY_LOCATION=$ORACLE_INVENTORY
ORACLE_BASE=$ORACLE_BASE
EOF

Next, I ran the installer package, using the response file:

[oracle@observer ~]$ $ORACLE_HOME/runInstaller -silent -force -waitforcompletion -responsefile ~/observer.rsp -ignorePrereqFailure
Launching Oracle Database Client Setup Wizard...

[WARNING] [INS-13014] Target environment does not meet some optional requirements.
CAUSE: Some of the optional prerequisites are not met. See logs for details. installActions2023-01-30_05-57-00PM.log
ACTION: Identify the list of failed prerequisite checks from the log: installActions2023-01-30_05-57-00PM.log. Then either from the log file or from installation manual find the appropriate configuration to meet the prerequisites and fix it manually.
The response file for this session can be found at:
/opt/oracle/product/19c/client_1/install/response/client_2023-01-30_05-57-00PM.rsp

You can find the log of this install session at:
/opt/oracle/tmp/InstallActions2023-01-30_05-57-00PM/installActions2023-01-30_05-57-00PM.log

As a root user, execute the following script(s):
1. /opt/oraInventory/orainstRoot.sh

Execute /opt/oraInventory/orainstRoot.sh on the following nodes:
[observer]
Successfully Setup Software with warning(s).
Moved the install session logs to:
/opt/oraInventory/logs/InstallActions2023-01-30_05-57-00PM
[oracle@observer ~]$

 

Configure and Test Networking


I added environment values to the oracle user’s .bashrc file:

cat << EOF >> ~/.bashrc
export ORACLE_BASE=/opt/oracle
export ORACLE_HOME=\$ORACLE_BASE/product/19c/client_1
export PATH=\$ORACLE_HOME/bin:\$PATH
EOF

After starting a new session, I checked that I could reach the database hosts from the dgmgrl command line using EZConnect:

[oracle@observer ~]$ dgmgrl
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Jan 30 18:29:20 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/oracle@DG1:1521/DG1
Connected to "DG1"
Connected as SYSDBA.
DGMGRL>

Success!

However, I don’t want to log into the database with a password, particularly not if the connection runs as a Linux service. I created a wallet directory and configured a wallet with the database credentials:

mkstore -wrl $WALLET_DIR -create
mkstore -wrl $WALLET_DIR -createEntry oracle.security.client.default_username SYS
mkstore -wrl $WALLET_DIR -createEntry oracle.security.client.default_password oracle

[oracle@observer ~]$ mkstore -wrl $WALLET_DIR -list

Oracle Secret Store Tool Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Enter wallet password:
Oracle Secret Store entries:
oracle.security.client.default_password
oracle.security.client.default_username
[oracle@observer ~]$

I created an entry in my tnsnames.ora file, and added the wallet information to the sqlnet.ora file:

cat << EOF >> $ORACLE_HOME/network/admin/sqlnet.ora

WALLET_LOCATION =
(SOURCE =
(METHOD = FILE)
(METHOD_DATA =
(DIRECTORY = $)
)
)

SQLNET.WALLET_OVERRIDE = TRUE
EOF

With the networking components in place, I tested a connection through dgmgrl using the wallet and TNS alias:

[oracle@observer ~]$ dgmgrl /@dg1
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Jan 30 18:39:55 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "ECO22"
Connected as SYSDBA.

It works!

 

Script Observer Startup

There are plenty of ways to script out startup of the Observer, depending on the database versions and Data Guard topology. Exit codes are important for a service-based process, too. Unexpected exits from the Data Guard Broker should return a non-zero value, so the service daemon recognizes and processes notifications, logging, and restart correctly. The service configuration itself will dictate some of this, too.

 

The following script is extremely basic. For one thing, it doesn’t query the participants to determine the primary database in the configuration. It’s also using the 11g-friendly "start observer” rather than the newer “start observer in background” command. You’ll want something better than this for “real” implementations, but it’s adequate to demonstrate the core objectives and isn’t cluttered with extras:

cat << EOS > /home/oracle/scripts/start_observer.sh
#!/bin/bash
log_file=/home/oracle/scripts/observer.log
service=DG1
$ORACLE_HOME/bin/dgmgrl /@\$ "start observer file=\"\$\""
EOF
EOS

chmod 750 /home/oracle/scripts/start_observer.sh

For testing, I suggest adding the -debug flag to see additional output that’s helpful for troubleshooting the process:

$ORACLE_HOME/bin/dgmgrl -debug ...

The verbose messaging of -debug shows behind-the-scenes activity:

[oracle@observer scripts]$ ./start_observer.sh
Created directory /opt/oracle/product/19c/client_1/dataguard
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Jan 30 18:59:42 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.

Welcome to DGMGRL, type "help" for information.
[W000 2023-01-30T18:59:42.800+00:00] Connecting to database using DG1.
[W000 2023-01-30T18:59:42.800+00:00] Attempt logon as SYSDG
[W000 2023-01-30T18:59:44.003+00:00] Attempt logon as SYSDBA
[W000 2023-01-30T18:59:44.103+00:00] Successfully logged on as SYSDBA
[W000 2023-01-30T18:59:44.103+00:00] Executing query [select sys_context('USERENV','CON_ID') from dual].
[W000 2023-01-30T18:59:44.104+00:00] Query result is '1'
[W000 2023-01-30T18:59:44.104+00:00] Executing query [select value from v$parameter where name = 'db_unique_name'].
[W000 2023-01-30T18:59:44.109+00:00] Query result is 'DG1'
Connected to "ECO22"
[W000 2023-01-30T18:59:44.109+00:00] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 2023-01-30T18:59:44.110+00:00] Oracle database version is '21.8.0.0.0'
Connected as SYSDBA.

Notice that I’m running a version 19.3 client, but the database is version 21.8. Different client versions are perfectly acceptable, but commands and features are limited to what’s available in the lowest version. I can’t run “start observer in background” from recent client libraries against an 11g database. Likewise, an older client won’t include commands for newer databases.

 

Configure a Service


NOTE: The following steps for configuring and creating a service to run the Observer must run with sudo or root access.

 

Adding a service is a matter of creating a service file under /etc/systemd/system for the service. The file name is the service name, and it must end with the .service suffix. There are plenty of ways to write service configurations—they can be fully self-contained or reference separate, modular components in other files. If you’re working in an environment with multiple Data Guard systems, you’ll probably set up a separate service for each, and it makes sense to use a common configuration file.

 

Remember, root is managing this process, invoking the oracle user to call the startup script, bypassing the user’s normal login process. The service needs to know all about the environment—especially the information normally set by oraenv or the oracle user’s shell login scripts. Configurations typically exist under /etc/sysconfig, and mine looks like this:

cat << EOF > /etc/sysconfig/oracle
ORACLE_BASE=/opt/oracle
ORACLE_HOME=/opt/oracle/product/19c/client_1
LD_LIBRARY_PATH=/opt/oracle/product/19c/client_1/lib
TNS_ADMIN=/opt/oracle/product/19c/client_1/network/admin
EOF

I could pass additional values here, like connection strings, and that would work if there’s only one Data Guard configuration to monitor. If there are more, limiting entries in the configuration file to common, reusable values is better.

Next, I created the service definition:

cat << EOF > /etc/systemd/system/oracle-fsfo-observer.service
[Unit]
Description=Service for Oracle Fast Start Failover Observer startup
After=syslog.target network.target

[Service]
LimitNOFILE=16384
LimitMEMLOCK=infinity

Type=forking

User=oracle
Group=oinstall
EnvironmentFile=/etc/sysconfig/oracle
WorkingDirectory=/home/oracle/scripts

RemainAfterExit=False
Restart=on-abort
RestartSec=5

ExecStart=/bin/bash -c '/home/oracle/scripts/start_observer.sh'

[Install]
WantedBy=multi-user.target
EOF

Some things to notice:

  • The After entry in the [Unit] section informs systemdctl that this service should start after the network service. Attempting Observer connections before the network starts will fail.

  • The [Service] section configures the user and group ownership for the service, and references the environment file created earlier. It also defines failure behaviors.

  • RemainAfterExit=False ends the service if it exits. Running the start observer command as a foreground process keeps the service “alive.” If the Observer’s dgmgrl process ends, we know it failed.

  • Restart manages service restarts. The on-abort option only restarts the service if it sends an “unclean signal.” Alternatives, like on-failure or on-abnormal handle different conditions. Consult with your systems administrators or read the documentation to determine what’s best for your situation.

  • ExecStart runs the script. Because this runs as a non-root user, calling it as a shell command is required.

For this demonstration, I placed my script under the oracle user’s home directory and gave the oracle user rwx permissions. It’s also using a wallet to connect as the SYS user.

 

Don’t do that.

 

Remember this script will be called by root, with elevated permissions, and anyone with access to the script could add nasty, destructive content. The same wallet that allows SYS connections to the Data Guard Broker will work with SQL*Plus, and, provided it doesn’t exit unexpectedly, the service will run whatever’s in the script.

Protect the script in production environments by removing write permissions and locating it in a secure directory!

 

Start and Test the Service

Now that the service definition and configuration are created, it’s time to enable and start the service. The command sequence may be familiar to anyone who’s worked with services in the past (with the exception of the custom service name):

systemctl daemon-reload
systemctl enable oracle-fsfo-observer.service
systemctl start oracle-fsfo-observer.service
systemctl status -l oracle-fsfo-observer.service

Even though we added the service files, the daemon doesn’t know about it until it’s refreshed with the daemon-reload command. Only then can you enable and start the newly added service!

 

After starting the service, I viewed its status. In the output, I see the Observer started message, telling me it’s running:

[root@observer ~]# systemctl start oracle-fsfo-observer.service
[root@observer ~]# systemctl status -l oracle-fsfo-observer.service
● oracle-fsfo-observer.service - Service for Oracle Fast Start Failover Observer startup
Loaded: loaded (/etc/systemd/system/oracle-fsfo-observer.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2023-01-30 19:10:54 GMT; 10s ago
Process: 65027 ExecStart=/bin/bash -c /home/oracle/scripts/start_observer.sh (code=exited, status=0/SUCCESS)

Jan 30 19:10:51 observer bash[65028]: Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Jan 30 19:10:51 observer bash[65028]: Welcome to DGMGRL, type "help" for information.
Jan 30 19:10:53 observer bash[65028]: Connected to "DG1"
Jan 30 19:10:53 observer bash[65028]: Connected as SYSDBA.
Jan 30 19:10:53 observer bash[65028]: Observer started
Jan 30 19:10:54 observer systemd[1]: oracle-fsfo-observer.service: Succeeded.
Jan 30 19:10:54 observer systemd[1]: Started Service for Oracle Fast Start Failover Observer startup.
[root@observer ~]#

The Observer is running as a daemon, and should it fail, systemctl handles restarts automatically (depending on the configuration in the Service section). There’s no need for PID files to record the process ID, check for an existing Observer before startup, and no dependency on cron to keep things going!

 

The final test for the configuration is whether it starts automatically when the system reboots:

[root@observer ~]# shutdown -r now
Connection to observer closed by remote host.
Connection to observer closed.
...
[opc@observer ~]$ sudo su -
Last login: Mon Jan 30 17:42:28 GMT 2023 on pts/0
[root@observer ~]# systemctl status -l oracle-fsfo-observer.service
● oracle-fsfo-observer.service - Service for Oracle Fast Start Failover Observer startup
Loaded: loaded (/etc/systemd/system/oracle-fsfo-observer.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2023-01-30 19:14:02 GMT; 13s ago
Process: 1519 ExecStart=/bin/bash -c /home/oracle/scripts/start_observer.sh (code=exited, status=0/SUCCESS)

Jan 30 19:14:00 observer bash[1530]: Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Jan 30 19:14:00 observer bash[1530]: Welcome to DGMGRL, type "help" for information.
Jan 30 19:14:02 observer bash[1530]: Connected to "DG1"
Jan 30 19:14:02 observer bash[1530]: Connected as SYSDBA.
Jan 30 19:14:02 observer bash[1530]: Observer started
Jan 30 19:14:02 observer systemd[1]: oracle-fsfo-observer.service: Succeeded.
Jan 30 19:14:02 observer systemd[1]: Started Service for Oracle Fast Start Failover Observer startup.
[root@observer ~]#

During bootup, the service started and called the script to begin observing the Data Guard target!

 

Taking advantage of native Linux system controls to manage the Observer in Fast-Start Failover configurations is a more durable and reliable method—and, ultimately, simpler—than reproducing similar functionality in cron-monitored scripts!

 

Observe on!