Viscosity's Blog

Oracle RAC + ASM + NFS Home Lab

Written by Gary Gordhamer | Nov 25, 2022 3:49:42 PM

Many people look for ways to emulate enterprise hardware in their home labs in order to get good hands-on experience with Oracle products.  One of the harder ones to get working is Oracle RAC due to the need for multiple networks and shared storage.

With the plethora of virtualization software out now, this has gotten much easier.  For my job, I needed a two-node RAC setup to do some testing of DBCA (Database Creation Assistant) scripts.  I've done 100's of RAC installs, so other than the time, I didn't foresee any issues.  Well, as usual, what can go wrong will go wrong.

 


Basic RAC Setup

So let's jump to the basics.  For RAC I needed a public network, a private network, and shared storage.  For the network pieces, I also needed additional virtual IPs for each node as well as a set of three SCAN IP addresses.

I set up a new virtual private network for my servers.  I added all the new IPs to my DNS server.  Things were going well.  I then did the trick to create shared ASM disks on a shared NFS mount:

dd if=/dev/zero of=/u03/oradata/asm_dsk1 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk2 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk3 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk4 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk5 bs=1k count=1000000
I then changed the ownership:

 

chown grid:asmdba /u03/oradata/asm-dsk?
chmod 660 /u03/oradata/asm-dsk?
That second command, to change the permissions is what I missed.  This was somewhat critical.  It also speaks to a problem I've seen many times over.  How to deal with Oracle access issues when you have role separation setup.  This is particularly common when running Oracle E-Business suite due to the number of assumptions made by the Oracle procedures for cloning an EBS instance.

 

Troubleshooting Role Separation Issues

As mentioned I've done this many times, and the only way to effectively troubleshoot these issues is to methodically go step by step through the setup of Oracle and Grid and ASM.  There are a few notes for help on this in MOS, but not a lot:

These all sound misleading, but let's go through the issue I faced.


The Issue

I had set up all my hardware and ASM raw disks.  After completing the GRID INFRASTRUCTURE installation, things looked good.  ASM was up and running, and the ASM disk group (DATA) looked fine.

I then went on to install the database home and try to create my RAC database with DBCA.  That is when the trouble started.  During the initial create the database statement I would get:

ORA-00200: control file could not be created
ORA-00202: control file: '+DATA'
ORA-15045: ASM file name '+DATA' is not in reference form
ORA-17502: ksfdcre:5 Failed to create file +DATA
ORA-27091: unable to queue I/O
ORA-27041: unable to open file

I jumped into the database alert log and DBCA logs, and garnered the following additional error:

Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-1501 signaled during: CREATE DATABASE "mydb2"
Ok, so time to go back and check everything. For role separation to work, there is an OS group that is used to allow for the oracle and grid users to have permissions on the ASM disks.  In my case, that group was supposed to be asmadmin.  Somewhere during the install I must have picked the wrong item in a pulldown box or didn't pay attention to a fixup script that ran.  Either way, I ended up with a difference.  So here is a list of steps to check that everything is right for role separation.  You need to check every item.


Role Separation Checklist

You should shut down CRS before making any changes / updating any of these items.  As the root user run on all nodes: 
$GRID_HOME/bin/crsctl stop crs
  1. Check the user id and group membership for the grid user:
    id grid

    (note the id number and group names, numbers the user is in)
  2. Do the same check for the oracle user:
    id oracle
  3. Make a list of the groups you are using for each role in oracle (ASM access, ASM super user, DB DBA, DB OPER, etc..)
  4. As the grid user, check the configuration for the Grid infrastructure
    ($GRID_HOME/rdbms/lib/conf.c)
    #define SS_DBA_GRP "asmdba"
    #define SS_OPER_GRP "asmoper"
    #define SS_ASM_GRP "asmadmin"
    If changes are needed, update the config.c and re-link the oracle binary
    cd $GRID_HOME/rdbms/lib
    make -f ins_rdbms.mk ioracle
  5. As the oracle user do the same check in the oracle database home:
    ($ORACLE_HOME/rdbms/lib/conf.c)
    #define SS_DBA_GRP "dba"
    #define SS_OPER_GRP "oper"
    #define SS_ASM_GRP "asmadmin"(note that the SS_ASM_GRP has to match the grid home setting)
    If you need to make changes, again re-link the oracle binary
    cd $ORACLE_HOME/rdbms/lib
    make -f ins_rdbms.mk ioracle
  6. Check that the oracle binary in the grid home has the right permissions and ownership. The user, group ownership are critical.  Also, the sticky bit has to be set for execution:
    ls -l $GRID_HOME/bin/oracle
    -rwsr-s--x. 1 grid asmadmin 291225032 Jun 6 10:15 /u01/app/12.1.0/grid/bin/oracle
    If this is not correct then as the root user run:
    $GRID_HOME/bin/setasmgidwrap $GRID_HOME/bin/oracle
  7. Do the same check for the database home oracle binary (again pay close attention to the user and group ownership as well as the sticky bit on the execution flag):
    ls -l $ORACLE_HOME/bin/oracle
    -rwsr-s--x. 1 oracle asmadmin 323613264 Jun 7 16:08 /u01/app/oracle/product/12.1.0/dbhome_1/bin/oracle
    If this is not correct, then as the root user run:
    $GRID_HOME/bin/setasmgidwrap $ORACLE_HOME/bin/oracle
  8. Check the ownership/permissions on the ASM source disks (in my case this was the raw files I had created on the shared NFS mount):
    ls -l /u03/oradata/asm*
    -rw-rw----. 1 grid asmadmin 10240000000 Jun 11 2018 /u03/oradata/asm_dsk1
    (Again note the user and group ownership, as well as file permissions. I was missing the write option on the group)
  9. Check NFS mount settings. Since I'm on a home lab using NFS, this is also important. If your in a SAN or Exadata situation, you shouldn't have to check this.
    I use the following settings, which are based on Oracle's recommended settings with the addition of "_netdev" which tells the startup scripts to wait until the network is started before trying to mount this specific mountpoint:
    rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,vers=3,timeo=600,actimeo=0,_netdev

You can now reboot or restart CRS (as the root user run $GRID_HOME/bin/crsctl start crs).

That should take care of permission issues when working with role separation.  It's important to go through every step and verify everything.  I spent a few hours repeating these steps over a few times, having missed one thing or another.  So even with years of experience, it's easy to miss a step.  What should have taken me about 30 minutes to resolve, ended up taking about 3 hours due to this.

Not the end of the world, but really should have been caught sooner.

Gary