Many people look for ways to emulate enterprise hardware in their home labs in order to get good hands-on experience with Oracle products. One of the harder ones to get working is Oracle RAC due to the need for multiple networks and shared storage.
With the plethora of virtualization software out now, this has gotten much easier. For my job, I needed a two-node RAC setup to do some testing of DBCA (Database Creation Assistant) scripts. I've done 100's of RAC installs, so other than the time, I didn't foresee any issues. Well, as usual, what can go wrong will go wrong.
Basic RAC Setup
So let's jump to the basics. For RAC I needed a public network, a private network, and shared storage. For the network pieces, I also needed additional virtual IPs for each node as well as a set of three SCAN IP addresses.
I set up a new virtual private network for my servers. I added all the new IPs to my DNS server. Things were going well. I then did the trick to create shared ASM disks on a shared NFS mount:
dd if=/dev/zero of=/u03/oradata/asm_dsk1 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk2 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk3 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk4 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk5 bs=1k count=1000000
chown grid:asmdba /u03/oradata/asm-dsk?
chmod 660 /u03/oradata/asm-dsk?
Troubleshooting Role Separation Issues
As mentioned I've done this many times, and the only way to effectively troubleshoot these issues is to methodically go step by step through the setup of Oracle and Grid and ASM. There are a few notes for help on this in MOS, but not a lot:
- UNIX: Diagnostic C program for ORA-1031 from CONNECT INTERNAL / AS SYSDBA (Doc ID 67984.1)
- ORA15183 Unable to Create Database on Server using 11.2 ASM and Grid Infrastructure (Doc ID 1054033.1)
- Database Creation on 11.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)
- How To Recompile config.c / Relink Executables Of A Grid Infrastructure Home (Cluster) (Doc ID 1637766.1)
- Connect as SYSDBA on 11.2 Cloned Home Gives "ORA-1031: Insufficient Privileges" Error (Doc ID 1061788.1)
These all sound misleading, but let's go through the issue I faced.
The Issue
I had set up all my hardware and ASM raw disks. After completing the GRID INFRASTRUCTURE installation, things looked good. ASM was up and running, and the ASM disk group (DATA) looked fine.
I then went on to install the database home and try to create my RAC database with DBCA. That is when the trouble started. During the initial create the database statement I would get:
ORA-00200: control file could not be created
ORA-00202: control file: '+DATA'
ORA-15045: ASM file name '+DATA' is not in reference form
ORA-17502: ksfdcre:5 Failed to create file +DATA
ORA-27091: unable to queue I/O
ORA-27041: unable to open file
I jumped into the database alert log and DBCA logs, and garnered the following additional error:
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-1501 signaled during: CREATE DATABASE "mydb2"
Role Separation Checklist
$GRID_HOME/bin/crsctl stop crs
- Check the user id and group membership for the grid user:
id grid
(note the id number and group names, numbers the user is in) - Do the same check for the oracle user:
id oracle
- Make a list of the groups you are using for each role in oracle (ASM access, ASM super user, DB DBA, DB OPER, etc..)
- As the grid user, check the configuration for the Grid infrastructure
If changes are needed, update the config.c and re-link the oracle binary($GRID_HOME/rdbms/lib/conf.c) #define SS_DBA_GRP "asmdba" #define SS_OPER_GRP "asmoper" #define SS_ASM_GRP "asmadmin"
cd $GRID_HOME/rdbms/lib make -f ins_rdbms.mk ioracle
- As the oracle user do the same check in the oracle database home:
If you need to make changes, again re-link the oracle binary($ORACLE_HOME/rdbms/lib/conf.c) #define SS_DBA_GRP "dba" #define SS_OPER_GRP "oper" #define SS_ASM_GRP "asmadmin"(note that the SS_ASM_GRP has to match the grid home setting)
cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk ioracle
- Check that the oracle binary in the grid home has the right permissions and ownership. The user, group ownership are critical. Also, the sticky bit has to be set for execution:
If this is not correct then as the root user run:ls -l $GRID_HOME/bin/oracle -rwsr-s--x. 1 grid asmadmin 291225032 Jun 6 10:15 /u01/app/12.1.0/grid/bin/oracle
$GRID_HOME/bin/setasmgidwrap $GRID_HOME/bin/oracle
- Do the same check for the database home oracle binary (again pay close attention to the user and group ownership as well as the sticky bit on the execution flag):
If this is not correct, then as the root user run:ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x. 1 oracle asmadmin 323613264 Jun 7 16:08 /u01/app/oracle/product/12.1.0/dbhome_1/bin/oracle
$GRID_HOME/bin/setasmgidwrap $ORACLE_HOME/bin/oracle
- Check the ownership/permissions on the ASM source disks (in my case this was the raw files I had created on the shared NFS mount):
(Again note the user and group ownership, as well as file permissions. I was missing the write option on the group)ls -l /u03/oradata/asm* -rw-rw----. 1 grid asmadmin 10240000000 Jun 11 2018 /u03/oradata/asm_dsk1
- Check NFS mount settings. Since I'm on a home lab using NFS, this is also important. If your in a SAN or Exadata situation, you shouldn't have to check this.
I use the following settings, which are based on Oracle's recommended settings with the addition of "_netdev" which tells the startup scripts to wait until the network is started before trying to mount this specific mountpoint:
rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,vers=3,timeo=600,actimeo=0,_netdev
You can now reboot or restart CRS (as the root user run $GRID_HOME/bin/crsctl start crs).
That should take care of permission issues when working with role separation. It's important to go through every step and verify everything. I spent a few hours repeating these steps over a few times, having missed one thing or another. So even with years of experience, it's easy to miss a step. What should have taken me about 30 minutes to resolve, ended up taking about 3 hours due to this.
Not the end of the world, but really should have been caught sooner.
Gary
SUBMIT YOUR COMMENT