A client’s Data Guard environment fell out of sync when archive logs went missing from the standby. A number of factors played a part:
-
Undersized redo log files. The environment switched as often as five times per minute.
-
Frequent RMAN archive log backups. The backup schedule performed archive log backups every five minutes.
-
Aggressive archive log retention and deletion settings. Several years ago, the DBA team grew hypersensitive to archive logs filling FRA. They responded with an aggressive deletion policy of
backed up 1 times to tape
but omittedapplied on standby
. -
Alert fatigue. The monitoring system generated multiple false positives for the production systems. After months of false alarms, the on-call DBAs developed a habit of quieting or ignoring alerts.
Sprinkle in a series of network hiccups between the primary and standby locations, and the scene is set:
-
Multiple standby databases were behind their primaries by several days (and hundreds of logs).
-
Random gaps of one to a dozen or so logs prevented the redo apply process from advancing.
I know it’s easy to judge the DBAs responsible for this environment, but you may be surprised at how common elements of this situation are. Systems evolve and outgrow their initial configurations. Undersized redo log files are more common than you think. Archive lock? Been there, done that, and it’s understandable that DBAs want to avoid it. And, anyone who’s been on-call knows the pain of getting pages in the dead of night for a not-problem!
Getting the standby environment back in sync meant identifying and restoring the missing archive logs from the primary’s tape backups, then copying them from ASM to a shared NFS filesystem where the standby database could consume them.
After scrolling through the directory contents to look for missing sequences, it became apparent I wasn’t dealing with a few missing files. I needed to automate the process of generating RMAN commands to restore the gaps.
Demonstration Case
I created a series of dummy files with the sequence number in the second position:
cd /tmp; touch x_1_x.dbf x_2_x.dbf x_4_x.dbf x_6_x.dbf x_9_x.dbf x_11_x.dbf x_12_x.dbf x_17_x.dbf x_20_x.dbf
I need to restore the missing sequences: 3, 5, 7 to 8, 10, 13 to 16, and 18 to 19.
awk
Saves the Day
Linux’ awk
utility performs file and pattern matching, including finding missing sequences. First, I need to extract sequence numbers in the second position of the archive log file name. For that, I’ll use find
and cut
:
find /tmp -name x_\*.dbf | cut -d_ -f2
The backslash in the find
command escapes the wildcard. In cut
, the -d_
sets the underscore as the field separator. The -f2
tells cut
to print the second field. The result isn’t exactly what I had in mind, though:
> find /tmp/test -name x_\*.dbf | cut -d_ -f2
17
11
9
12
1
20
2
4
6
The output isn’t sorted—it’s simply returned in whatever order find
provided. I need to append a sort
to the end of this command to produce a sequential result.
Since the sequences aren’t zero-padded, a simple sort
doesn’t work. It must be a numerical sort: sort -n
.
The following command provides the correct input for awk
to do its work:
find /tmp/test -name x_\*.dbf | cut -d_ -f2 | sort -n
Note: I could get away without this and do everything in
awk
, if the sequences were all the same length.
Process the Sequence
awk
works with fields, and the result piped into awk
consists of a single value, the sequence number. That makes it field number 1, and I can print it with:
find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '{print $1}'
This gives me a list of sequences I have. But what I want are the sequences I don’t have! For that, I’ll introduce a condition to my awk
command that only prints a line when the condition evaluates as true. I’m also adding a variable, x
, that tracks the last sequence value:
find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print $1 } { x = $1 }'
In plain English, it says: “If the incoming sequence isn’t equal to the last sequence plus one, print the sequence. When done, set the value of x
to the current sequence.” Now, comparing the current sequence with the last one, I get a more meaningful result:
> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print $1 } { x = $1 }'
4
6
9
11
17
20
That’s close to correct, but instead of printing the missing sequences, it’s showing the value that follows. I can fix that by subtracting one from the printed value:
> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print $1 - 1 } { x = $1 }'
3
5
8
10
16
19
Better, but not yet what’s needed. It’s showing the individual gaps (3, 5, and 10), but not those in a series (7 to 8, 13 to 16, and 18 to 19). But I already have that information in the variable, x
. I can print that, too:
> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print x + 1, $1 - 1} { x = $1 }'
3 3
5 5
7 8
10 10
13 16
18 19
Brilliant!
Now that I have the gaps, the next step is printing out the appropriate restore archivelog
commands for RMAN. But there are two possibilities to consider: the restore command for an individual log vs. the command for restoring logs between two values. I already have the from
and until
values in the output, and I could send that into an ordinary if/else
statement. But where’s the fun in that?
Conditional Processing in awk
Instead, let’s add the if/else processing within the existing awk
statement. The $1 != x + 1
condition already limits the output to the sequence gaps. All that’s left is differentiating the single and multiple file sequences.
In awk
, if/else logic spans multiple lines, and each condition’s actions end with a semicolon. To print a single sequence, I’ll check the last sequence, x
, against the current sequence, $1
. If they match, I know the gap is a single value, else it spans two or more files.
That’s more like it! All that’s left is filling in restore
commands around the sequences in the print
statements, adding parameters, and generating a complete RMAN script.
> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | \
awk '$1 != x + 1 {
if (x + 1 == $1 - 1)
print x + 1;
else print x + 1, $1 - 1;
} { x = $1 }'
3
5
7 8
10
13 16
18 19
The Scripts
For RAC Systems
The following script captures information about the local system in several variables:
-
archivelog_directory
: The archivelog directory on the target system. -
thread_count
: How many instances (threads) are in the cluster. -
log_prefix
: Any text that precedes the log name. This simply limits thefind
output. -
log_suffix
: The file suffix. Again, this limits thefind
output to just your archive logs. -
sequence_position
: Where the sequence number appears in the archive log file name. -
script_name
: The destination file for the output.
The only places beyond these variables where you might need to make changes are in the opening RMAN commands and where the value of the thread
is concatenated to the log_prefix
variable. Thread is usually the first value in log_archive_format
, but if not, adjust the find
command to look for the thread number in the correct spot. The script needs to know this to properly construct the restore
commands.
The -v t=$thread
added to the opening of the awk
command sets an awk
variable, t
, to the environment variable, $thread
.
export archivelog_directory=/tmp
export thread_count=2
export log_prefix=
export log_suffix=.dbf
export sequence_position=2
export sid=$ORACLE_SID
export script_name=restore_archivelogs."$".sh
cat << EOT > "$"
. oraenv <<< "$"
rman target / << EOF
run {
configure device type 'SBT_TAPE' backup type to backupset;
allocate channel ch01 device type 'SBT_TAPE';
EOT
for (( thread=1; thread<=$thread_count; thread++ ))
do log_prefix="$"
find "$" -name "$" | cut -d_ -f"$" | sort -n | \
awk -v t=$thread
'$1 != x + 1 {
if (x + 1 == $1 - 1)
print "restore archivelog sequence " x + 1 " t " thread ";";
else print "restore archivelog sequence from " x + 1 " t " thread " until sequence " $1 - 1 " thread " t ";";
} { x = $1 }' >> "$"
done
cat << EOT >> "$
EOF
EOT
chmod u+x "$"
For Non-RAC Systems
It’s simpler for non-RAC environments. The thread isn’t a concern and eliminates the need to loop over the directory contents.
export archivelog_directory=/tmp
export log_prefix=
export log_suffix=.dbf
export sequence_position=2
export sid=$ORACLE_SID
export script_name=restore_archivelogs."$".sh
cat << EOT > "$"
. oraenv <<< "$"
rman target / << EOF
run {
configure device type 'SBT_TAPE' backup type to backupset;
allocate channel ch01 device type 'SBT_TAPE';
EOT
find "$" -name "$" | cut -d_ -f"$" | sort -n | \
awk '$1 != x + 1 {
if (x + 1 == $1 - 1)
print "restore archivelog sequence " x + 1 " t " thread ";";
else print "restore archivelog sequence from " x + 1 " t " thread " until sequence " $1 - 1 " thread " t ";";
} { x = $1 }' >> "$"
cat << EOT >> "$
EOF
EOT
chmod u+x "$"
Example Output
The finished script will look like this:
. oraenv <<< mydb
rman target / << EOF
run {
configure device type 'SBT_TAPE' backup type to backupset;
allocate channel ch01 device type 'SBT_TAPE';
restore archivelog sequence 3;
restore archivelog sequence 5;
restore archivelog sequence from 7 until sequence 8;
restore archivelog sequence 10;
restore archivelog sequence from 13 until sequence 16;
restore archivelog sequence from 18 until sequence 19;
}
EOF
It sets the Oracle environment and passes a heredoc to RMAN.
This was a lot easier (and faster) than building statements by hand and certainly more efficient than trying to visually identify hundreds of missing files, or wait for the standby apply to fail on a missing log! RMAN spent considerable time allocating tape channels in this environment. Restoring everything within a single command block improved recovery times!
SUBMIT YOUR COMMENT