Viscosity's Blog

Find the Gap

Written by Sean Scott | Oct 14, 2022 8:22:52 PM

A client’s Data Guard environment fell out of sync when archive logs went missing from the standby. A number of factors played a part:

 

  • Undersized redo log files. The environment switched as often as five times per minute.

  • Frequent RMAN archive log backups. The backup schedule performed archive log backups every five minutes.

  • Aggressive archive log retention and deletion settings. Several years ago, the DBA team grew hypersensitive to archive logs filling FRA. They responded with an aggressive deletion policy of backed up 1 times to tape but omitted applied on standby.

  • Alert fatigue. The monitoring system generated multiple false positives for the production systems. After months of false alarms, the on-call DBAs developed a habit of quieting or ignoring alerts.

Sprinkle in a series of network hiccups between the primary and standby locations, and the scene is set:

 

  • Multiple standby databases were behind their primaries by several days (and hundreds of logs).

  • Random gaps of one to a dozen or so logs prevented the redo apply process from advancing.

I know it’s easy to judge the DBAs responsible for this environment, but you may be surprised at how common elements of this situation are. Systems evolve and outgrow their initial configurations. Undersized redo log files are more common than you think. Archive lock? Been there, done that, and it’s understandable that DBAs want to avoid it. And, anyone who’s been on-call knows the pain of getting pages in the dead of night for a not-problem!

 

Getting the standby environment back in sync meant identifying and restoring the missing archive logs from the primary’s tape backups, then copying them from ASM to a shared NFS filesystem where the standby database could consume them.

After scrolling through the directory contents to look for missing sequences, it became apparent I wasn’t dealing with a few missing files. I needed to automate the process of generating RMAN commands to restore the gaps.

 
Demonstration Case

I created a series of dummy files with the sequence number in the second position:

 

cd /tmp; touch x_1_x.dbf x_2_x.dbf x_4_x.dbf x_6_x.dbf x_9_x.dbf x_11_x.dbf x_12_x.dbf x_17_x.dbf x_20_x.dbf

I need to restore the missing sequences: 3, 5, 7 to 8, 10, 13 to 16, and 18 to 19.

 

awk Saves the Day

Linux’ awk utility performs file and pattern matching, including finding missing sequences. First, I need to extract sequence numbers in the second position of the archive log file name. For that, I’ll use find and cut:

 

find /tmp -name x_\*.dbf | cut -d_ -f2

 

The backslash in the find command escapes the wildcard. In cut, the -d_ sets the underscore as the field separator. The -f2 tells cut to print the second field. The result isn’t exactly what I had in mind, though:

 

> find /tmp/test -name x_\*.dbf | cut -d_ -f2
17
11
9
12
1
20
2
4
6

The output isn’t sorted—it’s simply returned in whatever order find provided. I need to append a sort to the end of this command to produce a sequential result.

Since the sequences aren’t zero-padded, a simple sort doesn’t work. It must be a numerical sort: sort -n.

 

The following command provides the correct input for awk to do its work:

 

find /tmp/test -name x_\*.dbf | cut -d_ -f2 | sort -n

 

Note: I could get away without this and do everything in awk, if the sequences were all the same length.

 

Process the Sequence

awk works with fields, and the result piped into awk consists of a single value, the sequence number. That makes it field number 1, and I can print it with:

 

find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '{print $1}'

 

This gives me a list of sequences I have. But what I want are the sequences I don’t have! For that, I’ll introduce a condition to my awk command that only prints a line when the condition evaluates as true. I’m also adding a variable, x, that tracks the last sequence value:

 

find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print $1 } { x = $1 }'

In plain English, it says: “If the incoming sequence isn’t equal to the last sequence plus one, print the sequence. When done, set the value of x to the current sequence.” Now, comparing the current sequence with the last one, I get a more meaningful result:

> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print $1 } { x = $1 }'
4
6
9
11
17
20

That’s close to correct, but instead of printing the missing sequences, it’s showing the value that follows. I can fix that by subtracting one from the printed value:

> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print $1 - 1 } { x = $1 }'
3
5
8
10
16
19

Better, but not yet what’s needed. It’s showing the individual gaps (3, 5, and 10), but not those in a series (7 to 8, 13 to 16, and 18 to 19). But I already have that information in the variable, x. I can print that, too:

> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | awk '$1 != x + 1 { print x + 1, $1 - 1} { x = $1 }'
3 3
5 5
7 8
10 10
13 16
18 19

Brilliant!

 

Now that I have the gaps, the next step is printing out the appropriate restore archivelog commands for RMAN. But there are two possibilities to consider: the restore command for an individual log vs. the command for restoring logs between two values. I already have the from and until values in the output, and I could send that into an ordinary if/else statement. But where’s the fun in that?

 

Conditional Processing in awk

Instead, let’s add the if/else processing within the existing awk statement. The $1 != x + 1 condition already limits the output to the sequence gaps. All that’s left is differentiating the single and multiple file sequences.

 

In awk, if/else logic spans multiple lines, and each condition’s actions end with a semicolon. To print a single sequence, I’ll check the last sequence, x, against the current sequence, $1. If they match, I know the gap is a single value, else it spans two or more files.

 

That’s more like it! All that’s left is filling in restore commands around the sequences in the print statements, adding parameters, and generating a complete RMAN script.

 

> find /tmp -name x_\*.dbf | cut -d_ -f2 | sort -n | \
awk '$1 != x + 1 {
if (x + 1 == $1 - 1)
print x + 1;
else print x + 1, $1 - 1;
} { x = $1 }'
3
5
7 8
10
13 16
18 19


The Scripts
For RAC Systems

 The following script captures information about the local system in several variables:

  • archivelog_directory: The archivelog directory on the target system.

  • thread_count: How many instances (threads) are in the cluster.

  • log_prefix: Any text that precedes the log name. This simply limits the find output.

  • log_suffix: The file suffix. Again, this limits the find output to just your archive logs.

  • sequence_position: Where the sequence number appears in the archive log file name.

  • script_name: The destination file for the output.

The only places beyond these variables where you might need to make changes are in the opening RMAN commands and where the value of the thread is concatenated to the log_prefix variable. Thread is usually the first value in log_archive_format, but if not, adjust the find command to look for the thread number in the correct spot. The script needs to know this to properly construct the restore commands.

 

The -v t=$thread added to the opening of the awk command sets an awk variable, t, to the environment variable, $thread.

export archivelog_directory=/tmp
export thread_count=2
export log_prefix=
export log_suffix=.dbf
export sequence_position=2
export sid=$ORACLE_SID
export script_name=restore_archivelogs."$".sh

cat << EOT > "$"
. oraenv <<< "$"
rman target / << EOF
run {
configure device type 'SBT_TAPE' backup type to backupset;
allocate channel ch01 device type 'SBT_TAPE';
EOT

for (( thread=1; thread<=$thread_count; thread++ ))
do log_prefix="$"
find "$" -name "$" | cut -d_ -f"$" | sort -n | \
awk -v t=$thread
'$1 != x + 1 {
if (x + 1 == $1 - 1)
print "restore archivelog sequence " x + 1 " t " thread ";";
else print "restore archivelog sequence from " x + 1 " t " thread " until sequence " $1 - 1 " thread " t ";";
} { x = $1 }' >> "$"
done

cat << EOT >> "$

EOF
EOT

chmod u+x "$"
 
For Non-RAC Systems

 It’s simpler for non-RAC environments. The thread isn’t a concern and eliminates the need to loop over the directory contents.

 

export archivelog_directory=/tmp
export log_prefix=
export log_suffix=.dbf
export sequence_position=2
export sid=$ORACLE_SID
export script_name=restore_archivelogs."$".sh

cat << EOT > "$"
. oraenv <<< "$"
rman target / << EOF
run {
configure device type 'SBT_TAPE' backup type to backupset;
allocate channel ch01 device type 'SBT_TAPE';
EOT

find "$" -name "$" | cut -d_ -f"$" | sort -n | \
awk '$1 != x + 1 {
if (x + 1 == $1 - 1)
print "restore archivelog sequence " x + 1 " t " thread ";";
else print "restore archivelog sequence from " x + 1 " t " thread " until sequence " $1 - 1 " thread " t ";";
} { x = $1 }' >> "$"

cat << EOT >> "$

EOF
EOT

chmod u+x "$"

 

 

Example Output

The finished script will look like this:

. oraenv <<< mydb
rman target / << EOF
run {
configure device type 'SBT_TAPE' backup type to backupset;
allocate channel ch01 device type 'SBT_TAPE';
restore archivelog sequence 3;
restore archivelog sequence 5;
restore archivelog sequence from 7 until sequence 8;
restore archivelog sequence 10;
restore archivelog sequence from 13 until sequence 16;
restore archivelog sequence from 18 until sequence 19;
}
EOF

It sets the Oracle environment and passes a heredoc to RMAN.

 

This was a lot easier (and faster) than building statements by hand and certainly more efficient than trying to visually identify hundreds of missing files, or wait for the standby apply to fail on a missing log! RMAN spent considerable time allocating tape channels in this environment. Restoring everything within a single command block improved recovery times!