Hi,
Sometimes You can get ” OGG-01454 Unable to lock file | Resource temporarily unavailable ” in Goldengate Replicat Processes.
OGG-01454 Unable to lock file
Details of error are as follows.
Source Context : SourceModule : [gglib.ggapp.FileProcessStatus] SourceID : [ggapp/FileProcessStatus.cpp] SourceMethod : [registerProcess] SourceLine : [127] ThreadBacktrace : [12] elements : [/ggateb01/goldengate/product/GG19cFor18cDB/libgglog.so(CMessageContext::AddThreadContext())] : [/ggateb01/goldengate/product/GG19cFor18cDB/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))] : [/ggateb01/goldengate/product/GG19cFor18cDB/libgglog.so(_MSG_String_Int64(CSourceContext*, int, char const*, long, CMessageFactory::Message Disposition))] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::ggapp::FileProcessStatus::registerProcess(ggs::gglib::ggapp::ProcessStatus Items const&))] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(open_communications(ggs::gglib::ggapp::ReplicationContextParams&))] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat()] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain())] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::Thre adArgs*))] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**))] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(main)] : [/lib64/libc.so.6(__libc_start_main)] : [/ggateb01/goldengate/product/GG19cFor18cDB/replicat()] 2020-10-29 02:06:42 ERROR OGG-01454 Unable to lock file "/ggateb01/goldengate/product/GG19cFor18cDB/dirpcs/RCCB03.pcr" (error 11, Resource temporarily unavailable) . Lock currently held by process id (PID) 74278.
Resource temporarily unavailable on Replicat
GGSCI (msdbadm01) 4> start RMSD01 Sending START request to MANAGER ... REPLICAT RMSD01 starting GGSCI (msdbadm01) 5> info RMSD01 REPLICAT RMSD01 Last Started 2020-10-29 02:06 Status ABENDED INTEGRATED Checkpoint Lag 00:00:00 (updated 05:40:29 ago) Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 11:43:28.000000 RBA 135758045 GGSCI (msdbadm01) 8> ! info RMSD01 REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING INTEGRATED Checkpoint Lag 00:00:00 (updated 05:40:43 ago) Process ID 88370 Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 11:43:28.000000 RBA 135758045 GGSCI (msdbadm01) 9> ! info RMSD01 REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING INTEGRATED Checkpoint Lag 00:00:00 (updated 05:40:47 ago) Process ID 88370 Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 11:43:28.000000 RBA 135758045 GGSCI (msdbadm01) 10> ! info RMSD01 REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING INTEGRATED Checkpoint Lag 00:00:00 (updated 05:40:50 ago) Process ID 88370 Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 11:43:28.000000 RBA 135758045 GGSCI (msdbadm01) 11> ! info RMSD01 REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING INTEGRATED Checkpoint Lag 00:00:00 (updated 05:40:51 ago) Process ID 88370 Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 11:43:28.000000 RBA 135758045 GGSCI (msdbadm01) 12> ! info RMSD01 REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING INTEGRATED Checkpoint Lag 00:00:00 (updated 05:40:53 ago) Process ID 88370 Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 11:43:28.000000 RBA 135758045 REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING INTEGRATED Checkpoint Lag 00:00:08 (updated 00:00:07 ago) Process ID 88370 Log Read Checkpoint File /ggateb01/iccb/c4000000000 2020-10-29 17:32:13.000000 RBA 151554476 GGSCI (msdbadm01) 16>
ERROR OGG-01454 Unable to lock file
The trails cannot be exclusively lock for writes by the server/collector process running on the target. As of v10.4, Server/Collector locks the trail file to prevent multiple processes from writing to the same trail file, so new Server/Collector processes are unable to lock the trail files.
Network outages that last longer than the time the TCP/IP stack is configured to retransmit unacknowledged packets may result in “orphan” TCP/IP connections on the RMTHOST system. Since the local system has closed the connections and the “RST” packets were lost due to the network outage, no packets (data or “control”) will ever be sent for these connections.
Since the RST packets were not delivered to the RMTHOST, the TCP/IP stack will not present an error to the Server/Collector process The Server/Collector process will continue to wait, passively, forever, for new data that will never arrive because the Extract process on the other system is no longer running.
A second cause for this symptom is that the remote server was rebooted and the Network-Attached Storage (NAS) device where the target trails reside did not detect and was not notified of the reboot, so the locks acquired prior to the reboot are still considered to be in force.
1) Investigate why a server/collector process is still running when a new server/collector process is started to access the same trail. You can kill orphan server/collector to resolve the immediate issue.
2) You can overwrite the server/collector by using the RMTHOST UNLOCKEDTRAILS option. Use this option with CAUTION as it can cause trail corruption. You must investigate why the trails are locked by another server or kill these server/collector processes.
NOTE that if an extract pump is stopped normally, the server/collector process stops immediately. By default, current versions (11.1/11.2 onwards) has a default timeout of 5 mins. Please refer to the reference for your version’s default. One can overwrite this value using the RMTHOST TIMEOUT option. Example setting timeout to 40 seconds.
RMTHOST 192.168.10.1, MGRPORT 7809, PARAMS TIMEOUT 40
This tells the Server/Collector to terminate if it doesn’t receive any checkpoint information for more than 40 seconds. DO NOT set too low a value, TCPIP communication performance varies throughout the day.
Other notes:
NAS related issue:
In the case where the NAS was unaware that the system had been rebooted, the best long-term solution is to contact the NAS vendor, who might be able to provide an utility program that can be run early in the system startup process to notify the NAS that it should release all locks owned by this system. The following procedure might offer a short-term work-around:
- Stop all REPLICAT processes that read the trail file.
- Stop the target MGR process.
- Copy trail file xx000000 to xx000000.bk
- Delete trail file xx000000.
- mv xx000000.bk to xx000000.
- Repeat steps 2-5 for each trail file that can’t be locked.
- From the shell, kill the server (collector) process that was writing to the trail. ie Check on OS level for orphan processes, e.g. on unix style OS’s: ps -ef | grep server
If any such orphan servers exist, e.g.: oracle 27165 1 0 11:20 ? 00:00:00 ./server -p 7840 -k -l /opt/oracle/gg/ggserr.log Then: kill 27165 (or, kill -9 27165) (for this particular case)
- Start MGR.
- Start the REPLICAT processes.
- Re-start the extract that abended and gave this error message.
Note that this may not work, depending on the NAS and the way it keeps track of advisory file locks acquired using fcntl( F_GETLK ).
Cluster failover:
When a system is failover to another node, the GoldenGate processes should be stopped typically by using ggsci > stop * and > stop mgr commands, however processes such a server/collectors remain running. Stop the extract pumps manually or kill the processes. You should check that no processes or running from the GoldenGate directory before switching GoldenGate to run on another node.