Many people look for ways to emulate enterprise hardware in their home labs in order to get good hands-on experience with Oracle products. One of the harder ones to get working is Oracle RAC due to the need for multiple networks and shared storage.
With the plethora of virtualization software out now, this has gotten much easier. For my job, I needed a two-node RAC setup to do some testing of DBCA (Database Creation Assistant) scripts. I've done 100's of RAC installs, so other than the time, I didn't foresee any issues. Well, as usual, what can go wrong will go wrong.
So let's jump to the basics. For RAC I needed a public network, a private network, and shared storage. For the network pieces, I also needed additional virtual IPs for each node as well as a set of three SCAN IP addresses.
I set up a new virtual private network for my servers. I added all the new IPs to my DNS server. Things were going well. I then did the trick to create shared ASM disks on a shared NFS mount:
dd if=/dev/zero of=/u03/oradata/asm_dsk1 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk2 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk3 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk4 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk5 bs=1k count=1000000
chown grid:asmdba /u03/oradata/asm-dsk?
chmod 660 /u03/oradata/asm-dsk?
As mentioned I've done this many times, and the only way to effectively troubleshoot these issues is to methodically go step by step through the setup of Oracle and Grid and ASM. There are a few notes for help on this in MOS, but not a lot:
These all sound misleading, but let's go through the issue I faced.
I had set up all my hardware and ASM raw disks. After completing the GRID INFRASTRUCTURE installation, things looked good. ASM was up and running, and the ASM disk group (DATA) looked fine.
I then went on to install the database home and try to create my RAC database with DBCA. That is when the trouble started. During the initial create the database statement I would get:
ORA-00200: control file could not be created
ORA-00202: control file: '+DATA'
ORA-15045: ASM file name '+DATA' is not in reference form
ORA-17502: ksfdcre:5 Failed to create file +DATA
ORA-27091: unable to queue I/O
ORA-27041: unable to open file
I jumped into the database alert log and DBCA logs, and garnered the following additional error:
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-1501 signaled during: CREATE DATABASE "mydb2"
$GRID_HOME/bin/crsctl stop crs
id grid
id oracle
($GRID_HOME/rdbms/lib/conf.c)
#define SS_DBA_GRP "asmdba"
#define SS_OPER_GRP "asmoper"
#define SS_ASM_GRP "asmadmin"
If changes are needed, update the config.c and re-link the oracle binarycd $GRID_HOME/rdbms/lib
make -f ins_rdbms.mk ioracle
($ORACLE_HOME/rdbms/lib/conf.c)
#define SS_DBA_GRP "dba"
#define SS_OPER_GRP "oper"
#define SS_ASM_GRP "asmadmin"(note that the SS_ASM_GRP has to match the grid home setting)
If you need to make changes, again re-link the oracle binarycd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ioracle
ls -l $GRID_HOME/bin/oracle
-rwsr-s--x. 1 grid asmadmin 291225032 Jun 6 10:15 /u01/app/12.1.0/grid/bin/oracle
If this is not correct then as the root user run:$GRID_HOME/bin/setasmgidwrap $GRID_HOME/bin/oracle
ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle asmadmin 323613264 Jun 7 16:08 /u01/app/oracle/product/12.1.0/dbhome_1/bin/oracle
If this is not correct, then as the root user run:$GRID_HOME/bin/setasmgidwrap $ORACLE_HOME/bin/oracle
ls -l /u03/oradata/asm*
-rw-rw----. 1 grid asmadmin 10240000000 Jun 11 2018 /u03/oradata/asm_dsk1
(Again note the user and group ownership, as well as file permissions. I was missing the write option on the group)rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,vers=3,timeo=600,actimeo=0,_netdev
You can now reboot or restart CRS (as the root user run $GRID_HOME/bin/crsctl start crs).
That should take care of permission issues when working with role separation. It's important to go through every step and verify everything. I spent a few hours repeating these steps over a few times, having missed one thing or another. So even with years of experience, it's easy to miss a step. What should have taken me about 30 minutes to resolve, ended up taking about 3 hours due to this.
Not the end of the world, but really should have been caught sooner.
Gary