Solved: thanks ravarooo and Marianne

nbustarter380 · ‎05-25-2015

Hello,

We have one Backup client that’s getting the code 155 disk full error

NetBackup status code: 155

Message: disk is full

Explanation: The write to the catalog file failed because the disk that contains the catalog database is full, or the track log folder is full.

Recommended Action: Free up space on the disks where NetBackup catalogs reside or where the track log folder resides and retry the operation.

Question for freeing up space on where the track log folder resides are there specific files that we can delete without the intervention of the AIX sysadmin? I don’t want to delete something that is important.

I check the /usr and it only has 1.77GB of space left

root)/> df -g /usr

Filesystem GB blocks Free %Used Iused %Iused Mounted on

/dev/hd2 10.38 1.77 83% 84616 17% /usr

The Code 155 error is only occurring one backup client the bkpkar.log is below

Bkpkar

20:54:24.375 [22413352] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<11000232fb.tif>, err_num:<28>, to write:<1>, wrote:<0>

20:54:24.400 [22413352] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full

20:54:24.400 [22413352] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full

20:54:24.400 [22413352] <4> bpbkar: INF - EXIT STATUS 155: disk is full

20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full

20:54:33.473 [22413352] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359832/), error is -1

20:54:33.485 [22413352] <2> ct_cat_close: close current track journal

20:54:33.485 [22413352] <2> ct_cat_close: close previous track journal

20:54:33.485 [22413352] <16> bpbkar: ct_cat_close() failed, error (14)

20:54:24.375 [22413352] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<11000232fb.tif>, err_num:<28>, to write:<1>, wrote:<0>

20:54:24.400 [22413352] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full

20:54:24.400 [22413352] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full

20:54:24.400 [22413352] <4> bpbkar: INF - EXIT STATUS 155: disk is full

20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full

20:54:33.473 [22413352] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359832/), error is -1

20:54:33.485 [22413352] <2> ct_cat_close: close current track journal

20:54:33.485 [22413352] <2> ct_cat_close: close previous track journal

20:54:33.485 [22413352] <16> bpbkar: ct_cat_close() failed, error (14)

21:04:42.097 [34275332] <4> is_excluded: Excluded /datasink/importDRM/backup/2015/15035/04/337763/core by exclude_list entry core

21:27:57.930 [34275332] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<21000522rb.tif>, err_num:<28>, to write:<1>, wrote:<0>

21:27:57.946 [34275332] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full

21:27:57.946 [34275332] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full

21:27:57.947 [34275332] <4> bpbkar: INF - EXIT STATUS 155: disk is full

21:28:05.949 [34275332] <16> bpbkar: ERR - read server exit status = 155: disk is full

21:28:06.197 [34275332] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359834/), error is -1

21:28:06.211 [34275332] <2> ct_cat_close: close current track journal

21:28:06.211 [34275332] <2> ct_cat_close: close previous track journal

21:28:06.211 [34275332] <16> bpbkar: ct_cat_close() failed, error (14)

The Client is a AIX Server running O.S 7.1

The backup client is 7.6.0.3

Thanks in Advance!

RiaanBadenhorst · ‎05-25-2015

Hi Guys,

This issue is usually seen on clients when the track log (when using accelerator) has filled up the / file system.

Please show us the output of df -h

View solution in original post

Jaime_Vazquez · ‎06-03-2015

Verify the date/time stamps of the Disk Full messages against jobs run on the NBU Master.

Check the core files created on the client. Delete them after you have found their cause. Core files can get rather large. They are created in the same directory as the executable that failed.

Verify what your settings are for debug logs being created on the client. The default is to keep them for 28 days, which is too large a value. This is especially true if the VERBOSE or DebugLevel configured for them is set too high. I tend to use a value of 4, a span of time sufficient to look at problems found over a long weekend. The files are written to the /usr/openv/logs or /usr/openv/netbackup/logs directories. It is possible they are taking an inordinate amount of space. Run this command to see how much space each directory/sub-directory is using:

du -m /usr

This should get space values, in MB, for /usr and for each sub-directory of /usr. Use the "-s" option for file level results.

View solution in original post

Jaime_Vazquez · ‎05-25-2015

This error is not associated with the client, per se, but with the Master Server. This is the explamnnation iof the error code:

Message: disk is full

Explanation: The write to the catalog file failed because the disk that containsthe catalog database is full.

Recommended Action: Free up space on the disks where NetBackup catalogs reside and retry the operation.

The problem is not with space issue on the client buit on the Master Server, where the catalog files reside. This error message is an indicator of that:

20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full

Note that this is a server exit status that is sent back to bpbkar on the client.

Look at whatever file system is in use that holds the catalog files on the Master. For Unix/Linux, that is typically where the /usr/openv directory lives.

nbustarter380 · ‎05-25-2015

Thanks Jamie,

I checked however I don't see any catalogs

[root@nbS0w0 db]# pwd

/usr/openv/netbackup/db

[root@nbuSw0 db]# ls

altnames client db error media

class cltmp DBVERSION_7.6.0.1 failure_history snapshot

class_internal cltmp_internal DBVERSION_7.6.0.2 IDIRSTRUCT ss

class_locks cltmp_template DBVERSION_7.6.0.3 images vault

class_template config discovery jobs

[root@nbS0w0 db]#

Also, If the problem is on the master why doesn't if affect other clients backups?

Best Regards

RiaanBadenhorst · ‎05-25-2015

Hi Guys,

This issue is usually seen on clients when the track log (when using accelerator) has filled up the / file system.

Please show us the output of df -h

Marianne · ‎05-26-2015

See if this helps:

How to redirect the NetBackup Accelerator track log to a different location
http://www.symantec.com/docs/HOWTO77409

Handy NetBackup Links

nbustarter380 · ‎05-26-2015

Thank, Riaan and Marianne for your responses

Riann ,Yes I believe the issue is the client as well

df –h is not a recognized command on the client server which is AIX)

df -g is the command I use for AIX

(root)/> df -h

df: Not a recognized flag: h

Usage: df [-P] | [-IMitv] [-gkm] [-s] [filesystem ...] [file ...]

filenet(root)/> df -g

Filesystem GB blocks Free %Used Iused %Iused Mounted on

/dev/hd4 1.38 0.95 32% 17994 8% /

/dev/hd2 10.38 1.77 83% 84621 17% /usr

/dev/hd9var 1.00 0.60 41% 10084 7% /var

/dev/hd3 3.62 3.52 3% 1801 1% /tmp

/dev/hd1 6.12 2.55 59% 347 1% /home

/dev/hd11admin 0.12 0.12 1% 9 1% /admin

/proc - - - - - /proc

/dev/hd10opt 0.50 0.27 46% 7149 9% /opt

/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump

/dev/aixtl 20.00 8.48 58% 3126 1% /aixtl

/dev/fnsw 10.00 9.43 6% 3749 1% /fnsw

/dev/fnswlocal 10.00 9.58 5% 3023 1% /fnsw/local

/dev/oraclesw 30.00 20.69 32% 40254 1% /oracle

/dev/oracleupg 30.00 30.00 1% 4 1% /oraswupg

/dev/oradata 220.00 28.84 87% 33 1% /oradata

/dev/backups 225.00 10.03 96% 24182 2% /Backups

/dev/datasink 1175.00 838.27 29% 9198024 5% /datasink

/dev/msar1 31000.00 2582.78 92% 1621718 1% /msar1

/dev/msar2 1000.00 323.34 68% 35 1% /msar2

/dev/trans 1995.00 576.91 72% 50 1% /trans

/dev/scratchpad 999.00 718.81 29% 93 1% /scratchpad

automtn:/data/exportATM 298.00 132.42 56% 975281 3% /datasink/importATM

netcom 50.00 41.31 18% 2276581 18% /datasink/importPDF

on Linux master server >>df –h is recognized

[root@nbu10w0 admincmd]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/vg_nbu50w0-lv_root 50G 7.5G 40G 16% /

tmpfs 64G 148K 64G 1% /dev/shm

/dev/sda1 485M 40M 420M 9% /boot

/dev/mapper/vg_nbu50w0-openv 171G 88G 75G 54% /usr/openv

/dev/mapper/data_00-LVdata00 50T 12T 36T 25% /data00

/dev/mapper/index00-LVindex00 99G 1.2G 93G 2% /index00

/dev/mapper/vg_oracle-oracle 60G 5.5G 51G 10% /oracle

/dev/mapper/vg_oracle-oradata 35G 4.1G 29G 13% /oradata

/dev/mapper/vg_oracle-oralog 55G 9.9G 42G 20% /oralog

/dev/mapper/vg_nbu10w0-software 51G 38G 11G 79% /software

[root@nbu10w0 admincmd]#

Marianne thanks I will try this

Also, Marianne,

Can I just delete a folder or folders? Not sure which ones that’s why I am asking. (if that will help solve the issue)

If not I will just try the redirect

Thanks again

RiaanBadenhorst · ‎05-26-2015

All your file systems seems fine. This must have been an isolated incident. Is it still failing with 155?

nbustarter380 · ‎05-27-2015

Hi Riaan,

Thanks, No its been failing with a 155 numerous times now and its still failing. I just did the redirect that Marianne suggest. So we will see how the next backup runs.

Best Regards

Jaime_Vazquez · ‎05-27-2015

AIX systems should be logging instances of running out of free space on a file system in the syslog.

Run the command "errpt -a | pg" to view the error log information entries. Look for entries that indicate an out of space condition. The error message will indicate the file system that is encountering the problem.

Use the "errclear #" to remove older messages from the log that are no longer valid. The "#" value specifies to delete entries that are older than that number of days old. Running the command "errclear 1" removes all previous days messages while "errclear 0" removes all existing error message entries in the log.

The entries are date/time stamped and can be correlated to the time of the backup. See the man pages for "errpt" and "errclear" for possible additional options that can be used to filter out specific messages.

nbustarter380 · ‎05-30-2015

Marianne,

The redirect did not solve the issue I

. Rename the track directory to make a backup copy:

# mv /usr/openv/netbackup/track /usr/openv/netbackup/track.sv

2. Copy the backup to a new location:

# cp -rp /usr/openv/netbackup/track.sv/* /aixtl/nbutemp

/nbuserver/nbuclient/nbuclient> pwd

/aixtl/nbutemp/nbuserver/nbuserver/nbuclient/nbuclient

/nbuserver/nbuclient/nbuclient>

Jamie,

Here is the result of the "errpt -a | pg"

errpt -a | pg

unless I am missing something there is nothing showing out of space

LABEL: CORE_DUMP

IDENTIFIER: A924A5FC

Date/Time: Fri May 22 13:23:33 EDT 2015

Sequence Number: 374

Machine Id: 00F6FB2C4C00

Node Id:

Class: S

Type: PERM

WPAR: Global

Resource Name: SYSPROC

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes

SOFTWARE PROGRAM

User Causes

USER GENERATED SIGNAL

Recommended Actions

CORRECT THEN RETRY

Failure Causes

SOFTWARE PROGRAM

Recommended Actions

RERUN THE APPLICATION PROGRAM

IF PROBLEM PERSISTS THEN DO THE FOLLOWING

CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

SIGNAL NUMBER

6

USER'S PROCESS ID:

35258542

FILE SYSTEM SERIAL NUMBER

16

INODE NUMBER

276

CORE FILE NAME

/fnsw/local/bin/core

PROGRAM NAME

HPII_val

STACK EXECUTION DISABLED

0

COME FROM ADDRESS REGISTER

??

PROCESSOR ID

hw_fru_id: N/A

hw_cpu_id: N/A

ADDITIONAL INFORMATION

shm_snmp_ 1A0

??

pthread_k B4

Symptom Data

REPORTABLE

1

INTERNAL ERROR

0

SYMPTOM CODE

PCSS/SPI2 FLDS/HPII_val SIG/6 FLDS/shm_snmp_ VALU/1a0

---------------------------------------------------------------------------

LABEL: CORE_DUMP

IDENTIFIER: A924A5FC

Date/Time: Fri May 22 11:09:07 EDT 2015

Sequence Number: 373

Machine Id: 00F6FB2C4C00

Node Id:

Class: S

Type: PERM

WPAR: Global

Resource Name: SYSPROC

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes

SOFTWARE PROGRAM

User Causes

USER GENERATED SIGNAL

Recommended Actions

CORRECT THEN RETRY

Failure Causes

SOFTWARE PROGRAM

Recommended Actions

RERUN THE APPLICATION PROGRAM

IF PROBLEM PERSISTS THEN DO THE FOLLOWING

CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

SIGNAL NUMBER

6

USER'S PROCESS ID:

34144306

FILE SYSTEM SERIAL NUMBER

16

INODE NUMBER

276

CORE FILE NAME

/fnsw/local/bin/core

PROGRAM NAME

HPII_val

STACK EXECUTION DISABLED

0

COME FROM ADDRESS REGISTER

??

PROCESSOR ID

hw_fru_id: N/A

hw_cpu_id: N/A

ADDITIONAL INFORMATION

shm_snmp_ 1A0

??

pthread_k B4

Symptom Data

REPORTABLE

1

INTERNAL ERROR

0

SYMPTOM CODE

PCSS/SPI2 FLDS/HPII_val SIG/6 FLDS/shm_snmp_ VALU/1a0

---------------------------------------------------------------------------

LABEL: J2_FS_FULL

IDENTIFIER: F7FA22C9

Date/Time: Wed May 20 20:44:08 EDT 2015

Sequence Number: 372

Machine Id: 00F6FB2C4C00

Node Id:

Class: O

Type: INFO

WPAR: Global

Resource Name: SYSJ2

Description

UNABLE TO ALLOCATE SPACE IN FILE SYSTEM

Probable Causes

FILE SYSTEM FULL

Recommended Actions

INCREASE THE SIZE OF THE ASSOCIATED FILE SYSTEM

REMOVE UNNECESSARY DATA FROM FILE SYSTEM

USE FUSER UTILITY TO LOCATE UNLINKED FILES STILL REFERENCED

Detail Data

JFS2 MAJOR/MINOR DEVICE NUMBER

000A 0005

FILE SYSTEM DEVICE AND MOUNT POINT

/dev/hd2, /usr

revarooo · ‎05-31-2015

If you are having core dumps, get your OS team to investigate the process core dumping in this case HPII_val

As for the backup, run a manual backup and check the disk space on both the Master and the client as it's erroring.

Marianne · ‎05-31-2015

There is this one on 20 May :

LABEL: J2_FS_FULL

IDENTIFIER: F7FA22C9

Date/Time: Wed May 20 20:44:08 EDT 2015

Handy NetBackup Links

nbustarter380 · ‎05-31-2015

thanks ravarooo and Marianne for your responses,

ravarooo,

I will check with the os sysadmins on the core dumps.

Marianne,

Thanks for pointing that out

Detail Data

JFS2 MAJOR/MINOR DEVICE NUMBER

000A 0005

FILE SYSTEM DEVICE AND MOUNT POINT

/dev/hd2, /usr

I did notice the below I am going to check with the sysadmins and see if more space can be given.

(root)/usr> df -g

Filesystem GB blocks Free %Used Iused %Iused Mounted on

/dev/hd2 10.38 1.60 85% 84687 18% /usr

the above shows only 1.60GB free, do you know if there is a recommended amout of space the /usr which contains the /open/netbackup should have free? Just checking because they may ask how much should be free 3GB, 4GB?

Best Regards

Marianne · ‎05-31-2015

The 'disk full' message was more than a week ago. More or less the same time as status 155? df output is current situation. It probably looked different on the 20th. Speak to sysadmins.

Handy NetBackup Links

Jaime_Vazquez · ‎06-03-2015

Verify the date/time stamps of the Disk Full messages against jobs run on the NBU Master.

Check the core files created on the client. Delete them after you have found their cause. Core files can get rather large. They are created in the same directory as the executable that failed.

Verify what your settings are for debug logs being created on the client. The default is to keep them for 28 days, which is too large a value. This is especially true if the VERBOSE or DebugLevel configured for them is set too high. I tend to use a value of 4, a span of time sufficient to look at problems found over a long weekend. The files are written to the /usr/openv/logs or /usr/openv/netbackup/logs directories. It is possible they are taking an inordinate amount of space. Run this command to see how much space each directory/sub-directory is using:

du -m /usr

This should get space values, in MB, for /usr and for each sub-directory of /usr. Use the "-s" option for file level results.

VOX

Netbackup code 155 occurring on an AIX client