A couple of days ago we had this issue where all the DB instances on one server of a RAC were failing with this error:
[oracle@hostname ~]$ srvctl start instance -d DBNAME -i INSTANCE_2
PRCR-1013 : Failed to start resource ora.DBNAME.db
PRCR-1064 : Failed to start resource ora.DBNAME.db on node hostname
CRS-5017: The resource action "ora.DBNAME.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/u01/app/oragrid/diag/crs/hostname/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.DBNAME.db' on 'hostname' failed
[oracle@hostname ~]$
Checking alert log it was like if a rogue process was still using memory segments:
ORA-1092 : opitsk aborting process
2023-12-07T10:39:48.004323+00:00
ORA-1092 : opitsk aborting process
2023-12-07T10:39:48.305329+00:00
Warning: 2 processes are still attacheded to shmid 229403:
(size: 53248 bytes, creator pid: 537143, last attach/detach pid:
But nothing was in use:
[root@hostname ~]# ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x21a2b3c0 196620 oragrid 600 45056 43
[root@hostname ~]#
After a failed attempt with an Oracle SR, we noticed /u02 was gone from server side.
Unix guys remounted but contents (and structure) were gone:
[oracle@hostname ~]$ ls -tlr /u02/
total 0
[oracle@hostname ~]$ ls -tld /u02/
drwxr-xr-x. 2 root root 6 Dec 7 09:24 /u02/
[oracle@hostname ~]$
We changed permints and recreated the expected folders:
[oracle@hostname ~]$ ls -tld /u02/app/oracle/diag/rdbms/
drwxr-xr-x. 5 oracle oinstall 63 Dec 8 11:43 /u02/app/oracle/diag/rdbms/
[oracle@hostname ~]$
After this, Instances started without issues. Hope this helps as pretty much everything you find it points to rogue processes holding memory.
Comments