05-25-2015 07:38 AM
Hello,
We have one Backup client that’s getting the code 155 disk full error
Message: disk is full
Explanation: The write to the catalog file failed because the disk that contains the catalog database is full, or the track log folder is full.
Recommended Action: Free up space on the disks where NetBackup catalogs reside or where the track log folder resides and retry the operation.
Question for freeing up space on where the track log folder resides are there specific files that we can delete without the intervention of the AIX sysadmin? I don’t want to delete something that is important.
I check the /usr and it only has 1.77GB of space left
root)/> df -g /usr
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd2 10.38 1.77 83% 84616 17% /usr
The Code 155 error is only occurring one backup client the bkpkar.log is below
Bkpkar
20:54:24.375 [22413352] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<11000232fb.tif>, err_num:<28>, to write:<1>, wrote:<0>
20:54:24.400 [22413352] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full
20:54:24.400 [22413352] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full
20:54:24.400 [22413352] <4> bpbkar: INF - EXIT STATUS 155: disk is full
20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full
20:54:33.473 [22413352] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359832/), error is -1
20:54:33.485 [22413352] <2> ct_cat_close: close current track journal
20:54:33.485 [22413352] <2> ct_cat_close: close previous track journal
20:54:33.485 [22413352] <16> bpbkar: ct_cat_close() failed, error (14)
20:54:24.375 [22413352] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<11000232fb.tif>, err_num:<28>, to write:<1>, wrote:<0>
20:54:24.400 [22413352] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full
20:54:24.400 [22413352] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full
20:54:24.400 [22413352] <4> bpbkar: INF - EXIT STATUS 155: disk is full
20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full
20:54:33.473 [22413352] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359832/), error is -1
20:54:33.485 [22413352] <2> ct_cat_close: close current track journal
20:54:33.485 [22413352] <2> ct_cat_close: close previous track journal
20:54:33.485 [22413352] <16> bpbkar: ct_cat_close() failed, error (14)
21:04:42.097 [34275332] <4> is_excluded: Excluded /datasink/importDRM/backup/2015/15035/04/337763/core by exclude_list entry core
21:27:57.930 [34275332] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<21000522rb.tif>, err_num:<28>, to write:<1>, wrote:<0>
21:27:57.946 [34275332] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full
21:27:57.946 [34275332] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full
21:27:57.947 [34275332] <4> bpbkar: INF - EXIT STATUS 155: disk is full
21:28:05.949 [34275332] <16> bpbkar: ERR - read server exit status = 155: disk is full
21:28:06.197 [34275332] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359834/), error is -1
21:28:06.211 [34275332] <2> ct_cat_close: close current track journal
21:28:06.211 [34275332] <2> ct_cat_close: close previous track journal
21:28:06.211 [34275332] <16> bpbkar: ct_cat_close() failed, error (14)
The Client is a AIX Server running O.S 7.1
The backup client is 7.6.0.3
Thanks in Advance!
Solved! Go to Solution.
05-25-2015 08:46 PM
Hi Guys,
This issue is usually seen on clients when the track log (when using accelerator) has filled up the / file system.
Please show us the output of df -h
06-03-2015 01:48 AM
Verify the date/time stamps of the Disk Full messages against jobs run on the NBU Master.
Check the core files created on the client. Delete them after you have found their cause. Core files can get rather large. They are created in the same directory as the executable that failed.
Verify what your settings are for debug logs being created on the client. The default is to keep them for 28 days, which is too large a value. This is especially true if the VERBOSE or DebugLevel configured for them is set too high. I tend to use a value of 4, a span of time sufficient to look at problems found over a long weekend. The files are written to the /usr/openv/logs or /usr/openv/netbackup/logs directories. It is possible they are taking an inordinate amount of space. Run this command to see how much space each directory/sub-directory is using:
du -m /usr
This should get space values, in MB, for /usr and for each sub-directory of /usr. Use the "-s" option for file level results.
05-25-2015 10:51 AM
This error is not associated with the client, per se, but with the Master Server. This is the explamnnation iof the error code:
Message: disk is full
Explanation: The write to the catalog file failed because the disk that containsthe catalog database is full.
Recommended Action: Free up space on the disks where NetBackup catalogs reside and retry the operation.
The problem is not with space issue on the client buit on the Master Server, where the catalog files reside. This error message is an indicator of that:
20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full
Note that this is a server exit status that is sent back to bpbkar on the client.
Look at whatever file system is in use that holds the catalog files on the Master. For Unix/Linux, that is typically where the /usr/openv directory lives.
05-25-2015 06:33 PM
Thanks Jamie,
I checked however I don't see any catalogs
[root@nbS0w0 db]# pwd
/usr/openv/netbackup/db
[root@nbuSw0 db]# ls
altnames client db error media
class cltmp DBVERSION_7.6.0.1 failure_history snapshot
class_internal cltmp_internal DBVERSION_7.6.0.2 IDIRSTRUCT ss
class_locks cltmp_template DBVERSION_7.6.0.3 images vault
class_template config discovery jobs
[root@nbS0w0 db]#
Also, If the problem is on the master why doesn't if affect other clients backups?
Best Regards
05-25-2015 08:46 PM
Hi Guys,
This issue is usually seen on clients when the track log (when using accelerator) has filled up the / file system.
Please show us the output of df -h
05-26-2015 12:18 AM
See if this helps:
How to redirect the NetBackup Accelerator track log to a different location
http://www.symantec.com/docs/HOWTO77409
05-26-2015 12:24 PM
Thank, Riaan and Marianne for your responses
Riann ,Yes I believe the issue is the client as well
df –h is not a recognized command on the client server which is AIX)
df -g is the command I use for AIX
(root)/> df -h
df: Not a recognized flag: h
Usage: df [-P] | [-IMitv] [-gkm] [-s] [filesystem ...] [file ...]
filenet(root)/> df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 1.38 0.95 32% 17994 8% /
/dev/hd2 10.38 1.77 83% 84621 17% /usr
/dev/hd9var 1.00 0.60 41% 10084 7% /var
/dev/hd3 3.62 3.52 3% 1801 1% /tmp
/dev/hd1 6.12 2.55 59% 347 1% /home
/dev/hd11admin 0.12 0.12 1% 9 1% /admin
/proc - - - - - /proc
/dev/hd10opt 0.50 0.27 46% 7149 9% /opt
/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump
/dev/aixtl 20.00 8.48 58% 3126 1% /aixtl
/dev/fnsw 10.00 9.43 6% 3749 1% /fnsw
/dev/fnswlocal 10.00 9.58 5% 3023 1% /fnsw/local
/dev/oraclesw 30.00 20.69 32% 40254 1% /oracle
/dev/oracleupg 30.00 30.00 1% 4 1% /oraswupg
/dev/oradata 220.00 28.84 87% 33 1% /oradata
/dev/backups 225.00 10.03 96% 24182 2% /Backups
/dev/datasink 1175.00 838.27 29% 9198024 5% /datasink
/dev/msar1 31000.00 2582.78 92% 1621718 1% /msar1
/dev/msar2 1000.00 323.34 68% 35 1% /msar2
/dev/trans 1995.00 576.91 72% 50 1% /trans
/dev/scratchpad 999.00 718.81 29% 93 1% /scratchpad
automtn:/data/exportATM 298.00 132.42 56% 975281 3% /datasink/importATM
netcom 50.00 41.31 18% 2276581 18% /datasink/importPDF
on Linux master server >>df –h is recognized
[root@nbu10w0 admincmd]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_nbu50w0-lv_root 50G 7.5G 40G 16% /
tmpfs 64G 148K 64G 1% /dev/shm
/dev/sda1 485M 40M 420M 9% /boot
/dev/mapper/vg_nbu50w0-openv 171G 88G 75G 54% /usr/openv
/dev/mapper/data_00-LVdata00 50T 12T 36T 25% /data00
/dev/mapper/index00-LVindex00 99G 1.2G 93G 2% /index00
/dev/mapper/vg_oracle-oracle 60G 5.5G 51G 10% /oracle
/dev/mapper/vg_oracle-oradata 35G 4.1G 29G 13% /oradata
/dev/mapper/vg_oracle-oralog 55G 9.9G 42G 20% /oralog
/dev/mapper/vg_nbu10w0-software 51G 38G 11G 79% /software
[root@nbu10w0 admincmd]#
Marianne thanks I will try this
Also, Marianne,
Can I just delete a folder or folders? Not sure which ones that’s why I am asking. (if that will help solve the issue)
If not I will just try the redirect
Thanks again
05-26-2015 09:09 PM
All your file systems seems fine. This must have been an isolated incident. Is it still failing with 155?
05-27-2015 08:14 AM
Hi Riaan,
Thanks, No its been failing with a 155 numerous times now and its still failing. I just did the redirect that Marianne suggest. So we will see how the next backup runs.
Best Regards
05-27-2015 10:20 AM
AIX systems should be logging instances of running out of free space on a file system in the syslog.
Run the command "errpt -a | pg" to view the error log information entries. Look for entries that indicate an out of space condition. The error message will indicate the file system that is encountering the problem.
Use the "errclear #" to remove older messages from the log that are no longer valid. The "#" value specifies to delete entries that are older than that number of days old. Running the command "errclear 1" removes all previous days messages while "errclear 0" removes all existing error message entries in the log.
The entries are date/time stamped and can be correlated to the time of the backup. See the man pages for "errpt" and "errclear" for possible additional options that can be used to filter out specific messages.
05-30-2015 04:57 PM
Marianne,
The redirect did not solve the issue I
. Rename the track directory to make a backup copy:
2. Copy the backup to a new location:
/nbuserver/nbuclient/nbuclient> pwd
/aixtl/nbutemp/nbuserver/nbuserver/nbuclient/nbuclient
/nbuserver/nbuclient/nbuclient>
Jamie,
Here is the result of the "errpt -a | pg"
errpt -a | pg
unless I am missing something there is nothing showing out of space
LABEL: CORE_DUMP
IDENTIFIER: A924A5FC
Date/Time: Fri May 22 13:23:33 EDT 2015
Sequence Number: 374
Machine Id: 00F6FB2C4C00
Node Id:
Class: S
Type: PERM
WPAR: Global
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
35258542
FILE SYSTEM SERIAL NUMBER
16
INODE NUMBER
276
CORE FILE NAME
/fnsw/local/bin/core
PROGRAM NAME
HPII_val
STACK EXECUTION DISABLED
0
COME FROM ADDRESS REGISTER
??
PROCESSOR ID
hw_fru_id: N/A
hw_cpu_id: N/A
ADDITIONAL INFORMATION
shm_snmp_ 1A0
??
??
pthread_k B4
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/HPII_val SIG/6 FLDS/shm_snmp_ VALU/1a0
---------------------------------------------------------------------------
LABEL: CORE_DUMP
IDENTIFIER: A924A5FC
Date/Time: Fri May 22 11:09:07 EDT 2015
Sequence Number: 373
Machine Id: 00F6FB2C4C00
Node Id:
Class: S
Type: PERM
WPAR: Global
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
34144306
FILE SYSTEM SERIAL NUMBER
16
INODE NUMBER
276
CORE FILE NAME
/fnsw/local/bin/core
PROGRAM NAME
HPII_val
STACK EXECUTION DISABLED
0
COME FROM ADDRESS REGISTER
??
PROCESSOR ID
hw_fru_id: N/A
hw_cpu_id: N/A
ADDITIONAL INFORMATION
shm_snmp_ 1A0
??
??
pthread_k B4
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/HPII_val SIG/6 FLDS/shm_snmp_ VALU/1a0
---------------------------------------------------------------------------
LABEL: J2_FS_FULL
IDENTIFIER: F7FA22C9
Date/Time: Wed May 20 20:44:08 EDT 2015
Sequence Number: 372
Machine Id: 00F6FB2C4C00
Node Id:
Class: O
Type: INFO
WPAR: Global
Resource Name: SYSJ2
Description
UNABLE TO ALLOCATE SPACE IN FILE SYSTEM
Probable Causes
FILE SYSTEM FULL
Recommended Actions
INCREASE THE SIZE OF THE ASSOCIATED FILE SYSTEM
REMOVE UNNECESSARY DATA FROM FILE SYSTEM
USE FUSER UTILITY TO LOCATE UNLINKED FILES STILL REFERENCED
Detail Data
JFS2 MAJOR/MINOR DEVICE NUMBER
000A 0005
FILE SYSTEM DEVICE AND MOUNT POINT
/dev/hd2, /usr
05-31-2015 04:20 AM
If you are having core dumps, get your OS team to investigate the process core dumping in this case HPII_val
As for the backup, run a manual backup and check the disk space on both the Master and the client as it's erroring.
05-31-2015 04:58 AM
There is this one on 20 May :
LABEL: J2_FS_FULL
IDENTIFIER: F7FA22C9
Date/Time: Wed May 20 20:44:08 EDT 2015
05-31-2015 03:18 PM
thanks ravarooo and Marianne for your responses,
ravarooo,
I will check with the os sysadmins on the core dumps.
Marianne,
Thanks for pointing that out
Detail Data
JFS2 MAJOR/MINOR DEVICE NUMBER
000A 0005
FILE SYSTEM DEVICE AND MOUNT POINT
/dev/hd2, /usr
I did notice the below I am going to check with the sysadmins and see if more space can be given.
(root)/usr> df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd2 10.38 1.60 85% 84687 18% /usr
the above shows only 1.60GB free, do you know if there is a recommended amout of space the /usr which contains the /open/netbackup should have free? Just checking because they may ask how much should be free 3GB, 4GB?
Best Regards
05-31-2015 10:14 PM
06-03-2015 01:48 AM
Verify the date/time stamps of the Disk Full messages against jobs run on the NBU Master.
Check the core files created on the client. Delete them after you have found their cause. Core files can get rather large. They are created in the same directory as the executable that failed.
Verify what your settings are for debug logs being created on the client. The default is to keep them for 28 days, which is too large a value. This is especially true if the VERBOSE or DebugLevel configured for them is set too high. I tend to use a value of 4, a span of time sufficient to look at problems found over a long weekend. The files are written to the /usr/openv/logs or /usr/openv/netbackup/logs directories. It is possible they are taking an inordinate amount of space. Run this command to see how much space each directory/sub-directory is using:
du -m /usr
This should get space values, in MB, for /usr and for each sub-directory of /usr. Use the "-s" option for file level results.