Solved: Client version 6.5.6 on the server - bp.conf has I... - Page 2

Netbackup_fan · ‎02-03-2012

Hi experts.

I need help here. the System Admin installed client 6.5.6, also the Sys Admin modified the bp.conf to add the following entries:

I notice that the bp.con has the following ownership, permissions:

-rw-r--r-- 1 root root 244 Jan 23 16:44 bp.conf

SERVER = NB-master

SERVER = NB-media

SERVER = timefinder

SERVER = amdocstest

CLIENT_NAME = centurion

ALLOW_MEDIA_OVERWRITE = TAR

ALLOW_MEDIA_OVERWRITE = ANSI

IGNORE_XATTR = YES

The backup is giving the following errors:

04:04:31 WRN - /export/home/oracle/dba/exp_pipe.dmp is a fifo special file. Back
ing up the raw fifo.
05:04:35 INF - Server status = 13
05:04:37 INF - Backup by oracle on client centurion using policy Database_Cold,
sched Logs_One_Year:file read failed

Mark_Solutions · ‎02-23-2012

It would be interesting to perform this backup to disk so that you can at least see if data is flowing at all (by moinitoring the size of the backup image file)

If it is flowing and still times out after an hour it may be a keep alive setting (client or media server or both) causing the sudden stop of the job.

Mark_Solutions · ‎02-23-2012

Is this right?:

BPCD ACCEPT FROM 10.13.30.12.60364 TO 192.168.112.101

Does your client have multiple NICS?

Should it be using the 192 address?

Just wondering if it starts sending its response on the wrong network

Nicolai · ‎02-24-2012

I did not see a reason for the status 13 in the bpbkar log. Are you sure it's the complete log ?

Marianne · ‎02-24-2012

I agree with Nicolai - the log covers the START of the backup. The LAST entry in the bpbkar log is what we see in the top of Job details:

11:38:10.336 [5733] <8> bpbkar process_file: WRN - /oradb/oravol01/exp_pipe.dmp is a fifo special file. Backing up the raw fifo.

No data transfer has started yet.

We need a FULL set of logs from a completed backup (after the failure).

Logs needed:

bpbkar log plus the log file produced by bpbackup (/export/home/oracle/dba/logs/LOGS.`date '+%Y-%m-%d'`.log) on the client as well as bpbrm and bptm logs on the media server.

Handy NetBackup Links

Mark_Solutions · ‎02-24-2012

Four more things:

1. Out of interest do you have a client read timeout on your media servers of 3600 by any chance (= exactly one hour)?

2. But yes - that is only the start of the log and would need to see it all

3. Does the client have multiple NICs?

4. Can you run it to disk to see if data is actually flowing?

Netbackup_fan · ‎02-24-2012

I 'am staring again the backup.

Netbackup_fan · ‎02-24-2012

This complex configuration is for use of tapes, robots, not to disk.

Netbackup_fan · ‎02-24-2012

Yesterday night and today in the morning, there was an error in the NEtbackup platform. All backups existed with status EXIT STATUS 25: cannot connect on socket.

This was solved at 10:16 am as was the time the Netbackup Adminstrator sent the email.

Mark_Solutions · ‎02-24-2012

Is the clock an hour out on centurion?

It does fail after an hour just things all seem odd

The failure is actually here:

11:19:29 INF - Backup by oracle on client centurion using policy Database_Cold, sched Logs_One_Year:file read failed

So it could be a genuine failure to read something or it could still be a keep alive / timeout issue

See what timeouts are set to one hour and what the keep alive timeouts are - the whole porcess my just be blocked:

check the setting:

# cat /proc/sys/net/ipv4/tcp_keepalive_time

7200

# cat /proc/sys/net/ipv4/tcp_keepalive_intvl

75

# cat /proc/sys/net/ipv4/tcp_keepalive_probes

9

If they look like the above then try changing the settings:

# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time

# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl

#echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes

To keep persistent after a reboot see below – use vi editor:

The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf

## Keepalive at 8.5 minutes

# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)

net.ipv4.tcp_keepalive_time=510

# close connection after 4 unanswered probes (default 9)

net.ipv4.tcp_keepalive_probes=3

# wait 45 seconds for reponse to each probe (default 75

net.ipv4.tcp_keepalive_intvl=3

These do not need a restart to take effect but run : chkconfig boot.sysctl on for the above to apply

Netbackup_fan · ‎02-24-2012

I think that the timeout is not the reason it fails. II think if the setting is 10 hours, after 10 hours you will get the same message.

I think it is because the exp_pipe.dmp is a fifo file created with this instructions:

mknod <path>/exp_pipe.dmp p

used for compressing online oracle database export dump files.

I need to have a fix, I need to know what is really happening, how to fix it .

Marianne · ‎02-24-2012

We can see in bptm log that NO DATA is received from the client:

10:28:17.894 [15395] <2> bptm: INITIATING (VERBOSE = 0): -w -c centurion -den 6 -rt 8 -rn 3 -stunit NB-master-hcart-robot-tld-3 -cl Database_Cold -bt 1330093693 -b centurion_1330093693 -st 2 -cj 30 -p Databases -reqid -1330090980 -jm -brm -hostname centurion -L /export/home/oracle/dba/logs/LOGS.2012-02-24.log -ru oracle -rclnt centurion -rclnthostname centurion -rl 17 -rp 518400 -sl Logs_One_Year -ct 0 -maxfrag 5120 -mediasvr NB-master -no_callback -connect_options 0x01010100 -jobid 645159 -jobgrpid 645159 -masterversion 650000 -bpbrm_shm_id 103 -blks_per_buffer 128

10:28:28.459 [15395] <4> write_backup: begin writing backup id centurion_1330093693, copy 1, fragment 1, to media id PJR213 on drive NB-master_Robot_003_Drive_26_SN-1250864366 (index 41)
10:28:28.459 [15395] <2> signal_parent: sending SIGUSR1 to bpbrm (pid = 15392)
...
10:28:28.463 [15395] <2> io_write_back_header: drive index 41, centurion_1330093693, file num = 8, mpx_headers = 0, copy 1
10:28:28.463 [15395] <2> write_data: completed writing backup header, start writing data when first buffer is available, copy 1

NO DATA IS EVER RECEIVED

bptm is killed one hour later by bpbrm - timeout.

11:28:28.571 [15395] <2> Media_dispatch_signal: calling catch_signal for 1 (bptm.c:25717) delay 0 seconds
11:28:30.577 [15395] <16> catch_signal: media manager terminated by parent process

The question now is : What NEEDS to be backed up on this client?

What is the purpose of specifying the exp_pipe.dmp file in the backup list-file?

If you need to backup oracle database export dump files, why not specify the folder containing the dump files?

Handy NetBackup Links

Netbackup_fan · ‎02-24-2012

Why? Because I have another server with NEtbackup 6.5.5 and it is backing up the exp_pipe.dmp , which is a pipe and is not giving the error.

/usr/openv/netbackup/bin>more version
NetBackup-Solaris8 6.5.5

Will_Restore · ‎02-24-2012

or, when is a dmp not a pipe?

in this latest log, no mention of fifo special file so I presume

/export/home/oracle/dba/exports/exp_pipe.dmp on client riogrande is a flat file.

how about running ls command on each server:

ls -l /oradb/oravol01/exp_pipe.dmp

ls -l /export/home/oracle/dba/exports/exp_pipe.dmp

Nicolai · ‎02-24-2012

Are you performing a Oracle Export via a pipe direct into Netbackup ?

Have you ever restored one of those exports from the 6.5.5 client ?

Mark_Solutions · ‎02-24-2012

This has been bugging me for days now and had another couple of thoughts ...

1. You say the timeout is 10 hours - but (and i dont have a console to hand) i did not think that the media server host properties client read timeout went that high (so could be wrong)

2. If it is compressing something it usually does not compress the actual file - it stages it somewhere first and from this creates the compressed file - so could it be running out of disk space?

I still think you have a NetBackup timeout of 3600 set somewhere though which causes your error

Netbackup_fan · ‎02-25-2012

In riogrande server where NEtbackup client version is 6.5.5:

/export/home/oracle>ls -ltr /export/home/oracle/dba/exports/exp_pipe.dmp
prw-r--r-- 1 oracle dba 0 Jan 17 18:57 /export/home/oracle/dba/ex
ports/exp_pipe.dmp
oracle@riogrande[ctimsp1]<deadbeef>
/export/home/oracle>

In centurion server , where the version is 6.5.6:

/oradb/oravol01>ls -ltr /oradb/oravol01/exp_pipe.dmp
prw-r--r-- 1 oracle dba 0 Feb 25 22:36 /oradb/oravol01/exp_pipe.d
mp
oracle@centurion[ctimsd1]<deadbeef>
/oradb/oravol01>

Netbackup_fan · ‎02-25-2012

No - I export and when it is finished I backup.

Did I restore? Yes I have the oportunity of restoring a table from the export dump file.

Netbackup_fan · ‎02-25-2012

Backup of the pipe successful in Netbackup version 6.5.5 .

Backup of the pipe is not successful in Netbackup 6.5.6.

So I can presume this is a regression issue.

Mark_Solutions · ‎02-27-2012

Perhaps the upgrade re-set the client read timeout?

I did a check this morning and the maximum Client Read Timeout setitng on a Media Server is 32767 (9 hours 6 minutes)

As per this tech note this timeout is vital for pipe backups:

http://www.symantec.com/docs/TECH89805

Extract:

There are a couple of caveats with this solution, however. One of which is that the back up of the named pipe file must be completed within the CLIENT_READ_TIMEOUT bp.conf setting on the NetBackup server(s). Otherwise, NetBackup will abort the backup and return a non-zero status code (41 in this case).

Marianne · ‎03-01-2012

Backup of the pipe successful in Netbackup version 6.5.5 .

Backup of the pipe is not successful in Netbackup 6.5.6.

The way I see this is that you have one of 2 choices:

Downgrade 6.5.6 client to 6.5.5
Log a Support call regarding the 'regression issue'.

Handy NetBackup Links

VOX

Client version 6.5.6 on the server - bp.conf has IGNORE_XATTR =