02-03-2012 11:42 AM
Hi experts.
I need help here. the System Admin installed client 6.5.6, also the Sys Admin modified the bp.conf to add the following entries:
I notice that the bp.con has the following ownership, permissions:
-rw-r--r-- 1 root root 244 Jan 23 16:44 bp.conf
SERVER = NB-master
SERVER = NB-media
SERVER = timefinder
SERVER = amdocstest
CLIENT_NAME = centurion
ALLOW_MEDIA_OVERWRITE = TAR
ALLOW_MEDIA_OVERWRITE = ANSI
IGNORE_XATTR = YES
The backup is giving the following errors:
04:04:31 WRN - /export/home/oracle/dba/exp_pipe.dmp is a fifo special file. Back
ing up the raw fifo.
05:04:35 INF - Server status = 13
05:04:37 INF - Backup by oracle on client centurion using policy Database_Cold,
sched Logs_One_Year:file read failed
Solved! Go to Solution.
02-23-2012 08:43 AM
It would be interesting to perform this backup to disk so that you can at least see if data is flowing at all (by moinitoring the size of the backup image file)
If it is flowing and still times out after an hour it may be a keep alive setting (client or media server or both) causing the sudden stop of the job.
02-23-2012 09:20 AM
Is this right?:
BPCD ACCEPT FROM 10.13.30.12.60364 TO 192.168.112.101
Does your client have multiple NICS?
Should it be using the 192 address?
Just wondering if it starts sending its response on the wrong network
02-24-2012 12:35 AM
I did not see a reason for the status 13 in the bpbkar log. Are you sure it's the complete log ?
02-24-2012 01:10 AM
I agree with Nicolai - the log covers the START of the backup. The LAST entry in the bpbkar log is what we see in the top of Job details:
11:38:10.336 [5733] <8> bpbkar process_file: WRN - /oradb/oravol01/exp_pipe.dmp is a fifo special file. Backing up the raw fifo.
No data transfer has started yet.
We need a FULL set of logs from a completed backup (after the failure).
Logs needed:
bpbkar log plus the log file produced by bpbackup (/export/home/oracle/dba/logs/LOGS.`date '+%Y-%m-%d'`.log) on the client as well as bpbrm and bptm logs on the media server.
02-24-2012 01:32 AM
Four more things:
1. Out of interest do you have a client read timeout on your media servers of 3600 by any chance (= exactly one hour)?
2. But yes - that is only the start of the log and would need to see it all
3. Does the client have multiple NICs?
4. Can you run it to disk to see if data is actually flowing?
02-24-2012 06:37 AM
I 'am staring again the backup.
02-24-2012 06:38 AM
This complex configuration is for use of tapes, robots, not to disk.
02-24-2012 08:13 AM
Yesterday night and today in the morning, there was an error in the NEtbackup platform. All backups existed with status EXIT STATUS 25: cannot connect on socket.
This was solved at 10:16 am as was the time the Netbackup Adminstrator sent the email.
02-24-2012 09:06 AM
Is the clock an hour out on centurion?
It does fail after an hour just things all seem odd
The failure is actually here:
11:19:29 INF - Backup by oracle on client centurion using policy Database_Cold, sched Logs_One_Year:file read failed
So it could be a genuine failure to read something or it could still be a keep alive / timeout issue
See what timeouts are set to one hour and what the keep alive timeouts are - the whole porcess my just be blocked:
check the setting:
# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
If they look like the above then try changing the settings:
# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
#echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
To keep persistent after a reboot see below – use vi editor:
The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf
## Keepalive at 8.5 minutes
# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)
net.ipv4.tcp_keepalive_time=510
# close connection after 4 unanswered probes (default 9)
net.ipv4.tcp_keepalive_probes=3
# wait 45 seconds for reponse to each probe (default 75
net.ipv4.tcp_keepalive_intvl=3
These do not need a restart to take effect but run : chkconfig boot.sysctl on for the above to apply
02-24-2012 09:13 AM
I think that the timeout is not the reason it fails. II think if the setting is 10 hours, after 10 hours you will get the same message.
I think it is because the exp_pipe.dmp is a fifo file created with this instructions:
mknod <path>/exp_pipe.dmp p
used for compressing online oracle database export dump files.
I need to have a fix, I need to know what is really happening, how to fix it .
02-24-2012 10:12 AM
We can see in bptm log that NO DATA is received from the client:
10:28:17.894 [15395] <2> bptm: INITIATING (VERBOSE = 0): -w -c centurion -den 6 -rt 8 -rn 3 -stunit NB-master-hcart-robot-tld-3 -cl Database_Cold -bt 1330093693 -b centurion_1330093693 -st 2 -cj 30 -p Databases -reqid -1330090980 -jm -brm -hostname centurion -L /export/home/oracle/dba/logs/LOGS.2012-02-24.log -ru oracle -rclnt centurion -rclnthostname centurion -rl 17 -rp 518400 -sl Logs_One_Year -ct 0 -maxfrag 5120 -mediasvr NB-master -no_callback -connect_options 0x01010100 -jobid 645159 -jobgrpid 645159 -masterversion 650000 -bpbrm_shm_id 103 -blks_per_buffer 128
10:28:28.459 [15395] <4> write_backup: begin writing backup id centurion_1330093693, copy 1, fragment 1, to media id PJR213 on drive NB-master_Robot_003_Drive_26_SN-1250864366 (index 41)
10:28:28.459 [15395] <2> signal_parent: sending SIGUSR1 to bpbrm (pid = 15392)
...
10:28:28.463 [15395] <2> io_write_back_header: drive index 41, centurion_1330093693, file num = 8, mpx_headers = 0, copy 1
10:28:28.463 [15395] <2> write_data: completed writing backup header, start writing data when first buffer is available, copy 1
NO DATA IS EVER RECEIVED
bptm is killed one hour later by bpbrm - timeout.
11:28:28.571 [15395] <2> Media_dispatch_signal: calling catch_signal for 1 (bptm.c:25717) delay 0 seconds
11:28:30.577 [15395] <16> catch_signal: media manager terminated by parent process
The question now is : What NEEDS to be backed up on this client?
What is the purpose of specifying the exp_pipe.dmp file in the backup list-file?
If you need to backup oracle database export dump files, why not specify the folder containing the dump files?
02-24-2012 10:53 AM
Why? Because I have another server with NEtbackup 6.5.5 and it is backing up the exp_pipe.dmp , which is a pipe and is not giving the error.
/usr/openv/netbackup/bin>more version
NetBackup-Solaris8 6.5.5
02-24-2012 12:34 PM
or, when is a dmp not a pipe?
in this latest log, no mention of fifo special file so I presume
/export/home/oracle/dba/exports/exp_pipe.dmp on client riogrande is a flat file.
how about running ls command on each server:
ls -l /oradb/oravol01/exp_pipe.dmp
ls -l /export/home/oracle/dba/exports/exp_pipe.dmp
02-24-2012 01:07 PM
Are you performing a Oracle Export via a pipe direct into Netbackup ?
Have you ever restored one of those exports from the 6.5.5 client ?
02-24-2012 02:59 PM
This has been bugging me for days now and had another couple of thoughts ...
1. You say the timeout is 10 hours - but (and i dont have a console to hand) i did not think that the media server host properties client read timeout went that high (so could be wrong)
2. If it is compressing something it usually does not compress the actual file - it stages it somewhere first and from this creates the compressed file - so could it be running out of disk space?
I still think you have a NetBackup timeout of 3600 set somewhere though which causes your error
02-25-2012 06:54 PM
In riogrande server where NEtbackup client version is 6.5.5:
/export/home/oracle>ls -ltr /export/home/oracle/dba/exports/exp_pipe.dmp
prw-r--r-- 1 oracle dba 0 Jan 17 18:57 /export/home/oracle/dba/ex
ports/exp_pipe.dmp
oracle@riogrande[ctimsp1]<deadbeef>
/export/home/oracle>
In centurion server , where the version is 6.5.6:
/oradb/oravol01>ls -ltr /oradb/oravol01/exp_pipe.dmp
prw-r--r-- 1 oracle dba 0 Feb 25 22:36 /oradb/oravol01/exp_pipe.d
mp
oracle@centurion[ctimsd1]<deadbeef>
/oradb/oravol01>
02-25-2012 06:56 PM
No - I export and when it is finished I backup.
Did I restore? Yes I have the oportunity of restoring a table from the export dump file.
02-25-2012 07:00 PM
Backup of the pipe successful in Netbackup version 6.5.5 .
Backup of the pipe is not successful in Netbackup 6.5.6.
So I can presume this is a regression issue.
02-27-2012 02:29 AM
Perhaps the upgrade re-set the client read timeout?
I did a check this morning and the maximum Client Read Timeout setitng on a Media Server is 32767 (9 hours 6 minutes)
As per this tech note this timeout is vital for pipe backups:
http://www.symantec.com/docs/TECH89805
Extract:
There are a couple of caveats with this solution, however. One of which is that the back up of the named pipe file must be completed within the CLIENT_READ_TIMEOUT bp.conf setting on the NetBackup server(s). Otherwise, NetBackup will abort the backup and return a non-zero status code (41 in this case).
03-01-2012 01:17 AM
Backup of the pipe successful in Netbackup version 6.5.5 .
Backup of the pipe is not successful in Netbackup 6.5.6.
The way I see this is that you have one of 2 choices: