Forum Discussion

Breaky_08's avatar
Breaky_08
Level 3
9 years ago

Server Status: Communication with the server has not been initiated or the server status has not been retrieved from the serve

Hi, in our enviroment we have one WinMaster 7.7.2 in Win2k8Stnd and three NBU 5220 2.7.2 as a media server.

Yesterday we notice that the oracle policies (10 policies) has problems with status code 6 in the NBU, but looking the logs messages at the client side are similar:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch01 channel at 06/14/2016 22:31:54
ORA-19506: failed to create sequential file, name="bk_PROD_137098_1_914536889", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
   VxBSACreateObject: Failed with error:
   Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve

All the errors occurs in the time windows from 22:00 to 23:00, and all the day all the jobs run with out problems.

Why the problems happens at this time and all the day the backups are sucessfull?

  • Important to look at logs when the error is seen. Ensure dbclient log folder exist on clients (with 777 permission) and bprd on the master. These logs will tell us what is wrong with comms between master and clients. Can you please also have a close look at the master server to see what kind of load and amount of jobs are initiated during the time when Oracle backup failures are seen?
  • Check name resolution forward and back.  Often caused by missing reverse lookup (IP to name).

     

  • Hi Will, thank you for your answer.

    there are around 10 diferent clients that fail with the same message. And all the other jobs shcedules in the day for the same clients work with out problems. But I will check the name resolution.

  • Important to look at logs when the error is seen. Ensure dbclient log folder exist on clients (with 777 permission) and bprd on the master. These logs will tell us what is wrong with comms between master and clients. Can you please also have a close look at the master server to see what kind of load and amount of jobs are initiated during the time when Oracle backup failures are seen?
  • Right, I missed this: All the errors occurs in the time windows from 22:00 to 23:00, and all the day all the jobs run with out problems.

    That does sound like loading issue rather than name resolution.

  • Hi Marianne, the amount of jobs in the windows (from 22:00 to 23:00) is 216 jobs.

    I filtered in the Activity Monitor.

    • 199 are backup jobs
    • 15 jobs are Image Clean up
    • 1 Duplication
    • 1 Snapshot (vmware)

    I will check your advice "Ensure dbclient log folder exist on clients (with 777 permission) and bprd on the master."

    Thank you for your response.

     

     

     

  • Hello, at the afternoon I reviewed at the client side the existence of dcblient log folder. For somes clients there was not created,  I ran the mklogdir script to create those folder.

    Somes clients already has the dbclient folder, and looking inside I found similar messages in these clients:

    DBCLIENT LOG
    22:32:51.098 [22086302] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/
    logs/22086302.0.1465963327
    22:32:51.099 [22086302] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/22086302.
    0.1465963327>
    22:32:51.099 [22086302] <16> CreateNewImage: ERR - serverResponse() failed
    22:32:51.151 [22086302] <16> VxBSACreateObject: ERR - Could not create new image with file /arc_RMSPRD_39345_1_914536926.
    22:32:51.151 [22086302] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error: Server Status:  Communication with the server
     has not been initiated or the server status has not been retrieved from the serve
    22:32:51.352 [22479420] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/
    logs/22479420.0.1465963328
    22:32:51.353 [22479420] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/22479420.
    0.1465963328>
    22:32:51.353 [22479420] <16> CreateNewImage: ERR - serverResponse() failed
    22:32:51.405 [22479420] <16> VxBSACreateObject: ERR - Could not create new image with file /arc_RMSPRD_39346_1_914536927.
    22:32:51.405 [22479420] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error: Server Status:  Communication with the server
     has not been initiated or the server status has not been retrieved from the serve

  • .<16> readCommFile: ERR - timed out after 900 seconds ...

    Change Client Read Timeout on the Media Server.  Try 1800, increase as needed. 

  • Hi Will, thank you for your response.

    All the media servers (nbu appliances) has already set CLIENT READ TIMEOUT = 3200.

    Today the backups runs ok.

    Our master is a VMware machine, I think that the workload on the master o something else at this time is affecting the backups.