OGG-01454 Unable to lock file | Resource temporarily unavailable on Replicat

Hi,

Sometimes You can get ” OGG-01454 Unable to lock file | Resource temporarily unavailable ” in Goldengate Replicat Processes.

 

OGG-01454 Unable to lock file

Details of error are as follows.

Source Context :
SourceModule : [gglib.ggapp.FileProcessStatus]
SourceID : [ggapp/FileProcessStatus.cpp]
SourceMethod : [registerProcess]
SourceLine : [127]
ThreadBacktrace : [12] elements
: [/ggateb01/goldengate/product/GG19cFor18cDB/libgglog.so(CMessageContext::AddThreadContext())]
: [/ggateb01/goldengate/product/GG19cFor18cDB/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]
: [/ggateb01/goldengate/product/GG19cFor18cDB/libgglog.so(_MSG_String_Int64(CSourceContext*, int, char const*, long, CMessageFactory::Message
Disposition))]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::ggapp::FileProcessStatus::registerProcess(ggs::gglib::ggapp::ProcessStatus
Items const&))]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(open_communications(ggs::gglib::ggapp::ReplicationContextParams&))]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat()]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain())]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::Thre
adArgs*))]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**))]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat(main)]
: [/lib64/libc.so.6(__libc_start_main)]
: [/ggateb01/goldengate/product/GG19cFor18cDB/replicat()]

2020-10-29 02:06:42 ERROR OGG-01454 Unable to lock file "/ggateb01/goldengate/product/GG19cFor18cDB/dirpcs/RCCB03.pcr" (error 11, Resource temporarily unavailable)
. Lock currently held by process id (PID) 74278.

Resource temporarily unavailable on Replicat

If you get this error on Replicat process, Process is locked by another processes.
To solve this error, you can kill the related PID , then Start replicat again as follows.
GGSCI (msdbadm01) 4> start RMSD01

Sending START request to MANAGER ...
REPLICAT RMSD01 starting


GGSCI (msdbadm01) 5> info RMSD01

REPLICAT RMSD01 Last Started 2020-10-29 02:06 Status ABENDED
INTEGRATED
Checkpoint Lag 00:00:00 (updated 05:40:29 ago)
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 11:43:28.000000 RBA 135758045


GGSCI (msdbadm01) 8> !
info RMSD01

REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 05:40:43 ago)
Process ID 88370
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 11:43:28.000000 RBA 135758045


GGSCI (msdbadm01) 9> !
info RMSD01

REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 05:40:47 ago)
Process ID 88370
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 11:43:28.000000 RBA 135758045


GGSCI (msdbadm01) 10> !
info RMSD01

REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 05:40:50 ago)
Process ID 88370
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 11:43:28.000000 RBA 135758045


GGSCI (msdbadm01) 11> !
info RMSD01

REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 05:40:51 ago)
Process ID 88370
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 11:43:28.000000 RBA 135758045


GGSCI (msdbadm01) 12> !
info RMSD01

REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 05:40:53 ago)
Process ID 88370
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 11:43:28.000000 RBA 135758045




REPLICAT RMSD01 Last Started 2020-10-29 17:24 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:08 (updated 00:00:07 ago)
Process ID 88370
Log Read Checkpoint File /ggateb01/iccb/c4000000000
2020-10-29 17:32:13.000000 RBA 151554476


GGSCI (msdbadm01) 16>

ERROR OGG-01454 Unable to lock file

Oracle Doc Says:

The trails cannot be exclusively lock for writes by the server/collector process running on the target. As of v10.4, Server/Collector locks the trail file to prevent multiple processes from writing to the same trail file, so new Server/Collector processes are unable to lock the trail files.

Network outages that last longer than the time the TCP/IP  stack is configured to retransmit unacknowledged packets may result in “orphan” TCP/IP connections on the RMTHOST system. Since the local system has closed the connections and the “RST” packets were lost due to the network outage, no packets (data or “control”) will ever be sent for these connections.
Since the RST packets were not delivered to the RMTHOST, the TCP/IP stack will not present an error to the Server/Collector process The Server/Collector process will continue to wait, passively, forever, for new data that will never arrive because the Extract process on the other system is no longer running.
A second cause for this symptom is that the remote server was rebooted and the Network-Attached Storage (NAS) device where the target trails reside did not detect and was not notified of the reboot, so the locks acquired prior to the reboot are still considered to be in force.

To solve This error, perform the following actions.

1) Investigate why a server/collector process is still running when a new server/collector process is started to access the same trail. You can kill orphan server/collector to resolve the immediate issue.

2) You can overwrite the server/collector by using the RMTHOST UNLOCKEDTRAILS option. Use this option with CAUTION as it can cause trail corruption. You must investigate why the trails are locked by another server or kill these server/collector processes.

 

NOTE that if an extract pump is stopped normally, the server/collector process stops immediately. By default, current versions (11.1/11.2 onwards) has a default timeout of 5 mins. Please refer to the reference for your version’s default. One can overwrite this value using the RMTHOST TIMEOUT option. Example setting timeout to 40 seconds.

RMTHOST 192.168.10.1, MGRPORT 7809, PARAMS TIMEOUT 40

This tells the Server/Collector to terminate if it doesn’t receive any checkpoint information for more than 40 seconds. DO NOT set too low a value, TCPIP communication performance varies throughout the day.

Other notes:

NAS related issue:

In the case where the NAS was unaware that the system had been rebooted, the best long-term solution is to contact the NAS vendor, who might be able to provide an utility program that can be run early in the system startup process to notify the NAS that it should release all locks owned by this system. The following procedure might offer a short-term work-around:

  1. Stop all REPLICAT processes that read the trail file.
  2. Stop the target MGR process.
  3. Copy trail file xx000000 to xx000000.bk
  4. Delete trail file xx000000.
  5. mv xx000000.bk to xx000000.
  6. Repeat steps 2-5 for each trail file that can’t be locked.
  7. From the shell, kill the server (collector) process that was writing to the trail.  ie Check on OS level for orphan processes, e.g. on unix style OS’s: ps -ef | grep server
       If any such orphan servers exist, e.g.:
       oracle   27165     1  0 11:20 ?        00:00:00  ./server -p 7840 -k -l /opt/oracle/gg/ggserr.log
       Then: kill 27165 (or, kill -9 27165) (for this particular case)
    
    
  8. Start MGR.
  9. Start the REPLICAT processes.
  10. Re-start the extract that abended and gave this error message.

 

Note that this may not work, depending on the NAS and the way it keeps track of advisory file locks acquired using fcntl( F_GETLK ).

 

Cluster failover:

When a system is failover to another node, the GoldenGate processes should be stopped typically by using ggsci > stop * and > stop mgr commands, however processes such a server/collectors remain running. Stop the extract pumps manually or kill the processes. You should check that no processes or running from the GoldenGate directory before switching GoldenGate to run on another node.

 

 

Do you want to learn Oracle Goldengate from scratch, then read the following Goldengate Tutorial articles.

Oracle Goldengate Tutorials for Beginners

About Mehmet Salih Deveci

I am Founder of SysDBASoft IT and IT Tutorial and Certified Expert about Oracle & SQL Server database, Goldengate, Exadata Machine, Oracle Database Appliance administrator with 10+years experience.I have OCA, OCP, OCE RAC Expert Certificates I have worked 100+ Banking, Insurance, Finance, Telco and etc. clients as a Consultant, Insource or Outsource.I have done 200+ Operations in this clients such as Exadata Installation & PoC & Migration & Upgrade, Oracle & SQL Server Database Upgrade, Oracle RAC Installation, SQL Server AlwaysOn Installation, Database Migration, Disaster Recovery, Backup Restore, Performance Tuning, Periodic Healthchecks.I have done 2000+ Table replication with Goldengate or SQL Server Replication tool for DWH Databases in many clients.If you need Oracle DBA, SQL Server DBA, APPS DBA,  Exadata, Goldengate, EBS Consultancy and Training you can send my email adress [email protected].-                                                                                                                                                                                                                                                 -Oracle DBA, SQL Server DBA, APPS DBA,  Exadata, Goldengate, EBS ve linux Danışmanlık ve Eğitim için  [email protected] a mail atabilirsiniz.

Leave a Reply

Your email address will not be published. Required fields are marked *