We all have seen something like this . We just finished a fresh clone from
Production to Test. After the post-clone steps are done, the services are coming up,
but then we found the Workflow Notification Mailer is stuck. Its stuck at starting phase.
In a recent project for one of our customer, we noticed this exact
scenario. After a successful clone, the mailer was perpetually stuck in a "Starting"
phase.
The Problem: New Credentials vs. Old State
After cloning, the Test environment still has the
Production SMTP and IMAP settings. Naturally, you have to update these
fields—changing the server name which should point to correct smtp address and
a valid credential to match the test or your cloned instance mailboxes
The issue seems to be that the Workflow Mailer is trying to start using the new credentials that were recently updated, but it is getting confused because it is still seeing the previous Running state and some old control messages that came along with the Production snapshot.
The Root Cause: "Identity Crisis"
When an EBS environment is cloned, the target database
initially behaves as if it’s still part of the source system. Even after
completing adcfgclone and AutoConfig, the Workflow Advanced Queues (AQ) may
still retain references or cached state information from the Production
environment.
In this situation, the Workflow Mailer was trying to
initialize, but it was encountering leftover control messages and orphaned
process IDs that did not exist on the new Test server, which prevented it from
starting properly.
So here is how we resolved the problem:-
Clearing the Database Block
Our first approach was to manually reset the component status. However, the UPDATE statement became unresponsive, indicating that another session was holding a lock on the FND_SVC_COMPONENTS table.
So, we identified the blocking session and killed it at the database level.
Once the session was killed, lock was cleared, we successfully forced the status to 'STOPPED'
To check
workflow mailer status:-
|
SQL> SELECT component_status FROM fnd_svc_components WHERE component_name = 'Workflow
Notification Mailer'; |
Find out
the session stuck with 'FND_SVC_COMPONENTS' component
|
SQL> SELECT
l.session_id AS blocking_session,
s.username,
s.osuser,
s.program,
s.status,
s.last_call_et AS seconds_in_wait FROM v$locked_object l JOIN v$session s ON l.session_id =
s.sid WHERE l.object_id in (SELECT
object_id FROM all_objects WHERE object_name = 'FND_SVC_COMPONENTS');
|
Kill the Session
If the above query shows any rows, it means a session is holding a lock and must be killed to release it. Use the SID and SERIAL# values from the query output and kill that session at the db level.
|
SQL> Alter system kill session ‘sid ,
serial#’ immediate; |
Update the
fnd_svc_components table forcefully as stopped.
|
SQL> UPDATE fnd_svc_components SET component_status = 'STOPPED' WHERE component_name = 'Workflow
Notification Mailer'; |
Reset the main component status
|
SQL>UPDATE
fnd_svc_comp_requests SET component_status = 'STOPPED' WHERE component_id = (SELECT component_id FROM fnd_svc_components WHERE component_name = 'Workflow Notification Mailer'); SQL> COMMIT; |
Also, you must check at OS level. If there is a "zombie" process, the mailer will never start correctly even after the above update.
ps -ef | grep FNDCPGSC
If a process exists, use kill -9 <PID>.
Once above steps are done, try to Workflow notification mailer. It should get started.