cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup code 155 occurring on an AIX client

nbustarter380
Level 6

Hello,

We  have one Backup client that’s getting the code 155 disk full error

 

NetBackup status code: 155

Message: disk is full

Explanation: The write to the catalog file failed because the disk that contains the catalog database is full, or the track log folder is full.

Recommended Action: Free up space on the disks where NetBackup catalogs reside or where the track log folder resides and retry the operation.

 

Question for freeing up space on where the track log folder resides are there specific files that we can delete without the intervention of the AIX sysadmin? I don’t want to delete something that is important.

 

I check the /usr  and it only has 1.77GB of space left

root)/> df -g /usr

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on

/dev/hd2          10.38              1.77   83%      84616    17%       /usr

 

The Code 155 error is only occurring one  backup client the bkpkar.log is below

 

Bkpkar

 

20:54:24.375 [22413352] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<11000232fb.tif>, err_num:<28>, to write:<1>, wrote:<0>

20:54:24.400 [22413352] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full

20:54:24.400 [22413352] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full

20:54:24.400 [22413352] <4> bpbkar: INF - EXIT STATUS 155: disk is full

20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full

20:54:33.473 [22413352] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359832/), error is -1

20:54:33.485 [22413352] <2> ct_cat_close: close current track journal

20:54:33.485 [22413352] <2> ct_cat_close: close previous track journal

20:54:33.485 [22413352] <16> bpbkar: ct_cat_close() failed, error (14)

 

 

20:54:24.375 [22413352] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<11000232fb.tif>, err_num:<28>, to write:<1>, wrote:<0>

20:54:24.400 [22413352] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full

20:54:24.400 [22413352] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full

20:54:24.400 [22413352] <4> bpbkar: INF - EXIT STATUS 155: disk is full

20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full

20:54:33.473 [22413352] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359832/), error is -1

20:54:33.485 [22413352] <2> ct_cat_close: close current track journal

20:54:33.485 [22413352] <2> ct_cat_close: close previous track journal

20:54:33.485 [22413352] <16> bpbkar: ct_cat_close() failed, error (14)

 

21:04:42.097 [34275332] <4> is_excluded: Excluded /datasink/importDRM/backup/2015/15035/04/337763/core by exclude_list entry core

21:27:57.930 [34275332] <16> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<21000522rb.tif>, err_num:<28>, to write:<1>, wrote:<0>

21:27:57.946 [34275332] <16> bpbkar SelectFile: fscp_add_full_entry() failed, error : disk is full

21:27:57.946 [34275332] <16> bpbkar: ERR - bpbkar FATAL exit status = 155: disk is full

21:27:57.947 [34275332] <4> bpbkar: INF - EXIT STATUS 155: disk is full

21:28:05.949 [34275332] <16> bpbkar: ERR - read server exit status = 155: disk is full

21:28:06.197 [34275332] <16> ct_cat_close: failed to fflush for file in dir(/datasink/importDRM/backup_0315/15069/03/359834/), error is -1

21:28:06.211 [34275332] <2> ct_cat_close: close current track journal

21:28:06.211 [34275332] <2> ct_cat_close: close previous track journal

21:28:06.211 [34275332] <16> bpbkar: ct_cat_close() failed, error (14)

 

The Client is a AIX Server running O.S  7.1

The backup client is  7.6.0.3

 

Thanks in Advance!

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

RiaanBadenhorst
Level 6
Partner    VIP    Accredited Certified

Hi Guys,

 

This issue is usually seen on clients when the track log (when using accelerator) has filled up the / file system.

 

Please show us the output of df -h

View solution in original post

Jaime_Vazquez
Level 6
Employee

Verify the date/time stamps of the Disk Full messages against jobs run on the NBU Master.

Check the core files created on the client.  Delete them after you have found their cause.  Core files can get rather large.  They are created in the same directory as the executable that failed.

Verify what your settings are for debug logs being created on the client.  The default is to keep them for 28 days, which is too large a value. This is especially true if the VERBOSE or DebugLevel configured for them is set too high. I tend to use a value of 4, a span of time sufficient to look at problems found over a long weekend. The files are written to the /usr/openv/logs or /usr/openv/netbackup/logs directories. It is possible they are taking an inordinate amount of space. Run this command to see how much space each directory/sub-directory is using:

du -m /usr

This should get space values, in MB, for /usr and for each sub-directory of /usr.  Use the "-s" option for file level results.

 

View solution in original post

14 REPLIES 14

Jaime_Vazquez
Level 6
Employee

This error is not associated with the client, per se, but with the Master Server.  This is the explamnnation iof the error code:

Message: disk is full

Explanation: The write to the catalog file failed because the disk that containsthe catalog database is full.

Recommended Action: Free up space on the disks where NetBackup catalogs reside and retry the operation.

 

The problem is not with space issue on the client  buit on the Master Server, where the catalog files reside. This error message is an indicator of that:

20:54:33.258 [22413352] <16> bpbkar: ERR - read server exit status = 155: disk is full

Note that this is a server exit status that is sent back to bpbkar on the client.

Look at whatever file system is in use that holds the catalog files on the Master.  For Unix/Linux, that is typically where the /usr/openv directory lives.

 

 

nbustarter380
Level 6

Thanks Jamie,

 

I checked however I don't see any catalogs

[root@nbS0w0 db]# pwd

/usr/openv/netbackup/db

[root@nbuSw0 db]# ls

altnames        client          db                 error            media

class           cltmp           DBVERSION_7.6.0.1  failure_history  snapshot

class_internal  cltmp_internal  DBVERSION_7.6.0.2  IDIRSTRUCT       ss

class_locks     cltmp_template  DBVERSION_7.6.0.3  images           vault

class_template  config          discovery          jobs

[root@nbS0w0 db]#

 

Also, If the problem is on the master why doesn't if affect other clients backups?

Best Regards

 

RiaanBadenhorst
Level 6
Partner    VIP    Accredited Certified

Hi Guys,

 

This issue is usually seen on clients when the track log (when using accelerator) has filled up the / file system.

 

Please show us the output of df -h

Marianne
Level 6
Partner    VIP    Accredited Certified

See if this helps:

How to redirect the NetBackup Accelerator track log to a different location 
http://www.symantec.com/docs/HOWTO77409 

nbustarter380
Level 6

Thank, Riaan and Marianne for your responses

Riann ,Yes I believe the issue is the client as well

df –h is not a recognized command on the client server which is AIX)

df  -g is the command I use for AIX

(root)/> df -h

df: Not a recognized flag: h

Usage: df  [-P] | [-IMitv] [-gkm] [-s] [filesystem ...] [file ...]

filenet(root)/> df -g

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on

/dev/hd4           1.38      0.95   32%    17994     8% /

/dev/hd2          10.38      1.77   83%    84621    17% /usr

/dev/hd9var        1.00      0.60   41%    10084     7% /var

/dev/hd3           3.62      3.52    3%     1801     1% /tmp

/dev/hd1           6.12      2.55   59%      347     1% /home

/dev/hd11admin      0.12      0.12    1%        9     1% /admin

/proc                 -         -    -         -     -  /proc

/dev/hd10opt       0.50      0.27   46%     7149     9% /opt

/dev/livedump      0.25      0.25    1%        4     1% /var/adm/ras/livedump

/dev/aixtl        20.00      8.48   58%     3126     1% /aixtl

/dev/fnsw         10.00      9.43    6%     3749     1% /fnsw

/dev/fnswlocal     10.00      9.58    5%     3023     1% /fnsw/local

/dev/oraclesw     30.00     20.69   32%    40254     1% /oracle

/dev/oracleupg     30.00     30.00    1%        4     1% /oraswupg

/dev/oradata     220.00     28.84   87%       33     1% /oradata

/dev/backups     225.00     10.03   96%    24182     2% /Backups

/dev/datasink   1175.00    838.27   29%  9198024     5% /datasink

/dev/msar1     31000.00   2582.78   92%  1621718     1% /msar1

/dev/msar2      1000.00    323.34   68%       35     1% /msar2

/dev/trans      1995.00    576.91   72%       50     1% /trans

/dev/scratchpad    999.00    718.81   29%       93     1% /scratchpad

automtn:/data/exportATM    298.00    132.42   56%   975281     3% /datasink/importATM

netcom     50.00     41.31   18%  2276581    18% /datasink/importPDF

 

on Linux master server  >>df –h is recognized

 

[root@nbu10w0 admincmd]# df -h

Filesystem                       Size  Used Avail Use% Mounted on

/dev/mapper/vg_nbu50w0-lv_root    50G  7.5G   40G  16% /

tmpfs                             64G  148K   64G   1% /dev/shm

/dev/sda1                        485M   40M  420M   9% /boot

/dev/mapper/vg_nbu50w0-openv     171G   88G   75G  54% /usr/openv

/dev/mapper/data_00-LVdata00      50T   12T   36T  25% /data00

/dev/mapper/index00-LVindex00     99G  1.2G   93G   2% /index00

/dev/mapper/vg_oracle-oracle      60G  5.5G   51G  10% /oracle

/dev/mapper/vg_oracle-oradata     35G  4.1G   29G  13% /oradata

/dev/mapper/vg_oracle-oralog      55G  9.9G   42G  20% /oralog

/dev/mapper/vg_nbu10w0-software   51G   38G   11G  79% /software

[root@nbu10w0 admincmd]#

 

Marianne thanks I will try this

Also, Marianne,

Can I just delete a folder or folders? Not sure which ones that’s why I am asking.  (if that will help solve the issue)

If not I will just try the redirect

 

Thanks again

 

RiaanBadenhorst
Level 6
Partner    VIP    Accredited Certified

All your file systems seems fine. This must have been an isolated incident. Is it still failing with 155?

nbustarter380
Level 6

Hi Riaan,

Thanks, No its been failing  with a 155 numerous times now and its still failing. I just did the redirect that Marianne suggest. So we will see how the next backup runs.

 

Best Regards

 

Jaime_Vazquez
Level 6
Employee

AIX systems should be logging instances of running out of free space on a file system in the syslog.

Run the command "errpt -a | pg" to view the error log information entries. Look for entries that indicate an out of space condition. The error message will indicate the file system that is encountering the problem.

Use the "errclear #" to remove older messages from the log that are no longer valid.  The "#" value specifies to delete entries that are older than that number of days old.   Running the command "errclear 1" removes all previous days messages while  "errclear 0" removes all existing error message entries in the log.

The entries are date/time stamped and can be correlated to the time of the backup.  See the man pages for "errpt" and "errclear" for possible additional options that can be used to filter out  specific messages.

 

 

nbustarter380
Level 6

Marianne,

The redirect did not solve the issue I

. Rename the track directory to make a backup copy:

# mv /usr/openv/netbackup/track /usr/openv/netbackup/track.sv

2. Copy the backup to a new location:

# cp -rp /usr/openv/netbackup/track.sv/* /aixtl/nbutemp

/nbuserver/nbuclient/nbuclient> pwd

/aixtl/nbutemp/nbuserver/nbuserver/nbuclient/nbuclient

/nbuserver/nbuclient/nbuclient>

Jamie,

Here is the result of the "errpt -a | pg"

errpt -a | pg

unless I am missing something there is nothing showing out of space

LABEL:          CORE_DUMP

IDENTIFIER:     A924A5FC

Date/Time:       Fri May 22 13:23:33 EDT 2015

Sequence Number: 374

Machine Id:      00F6FB2C4C00

Node Id:       

Class:           S

Type:            PERM

WPAR:            Global

Resource Name:   SYSPROC

 

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

 

Probable Causes

SOFTWARE PROGRAM

User Causes

USER GENERATED SIGNAL

        Recommended Actions

        CORRECT THEN RETRY

Failure Causes

SOFTWARE PROGRAM

        Recommended Actions

        RERUN THE APPLICATION PROGRAM

        IF PROBLEM PERSISTS THEN DO THE FOLLOWING

        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

SIGNAL NUMBER

           6

USER'S PROCESS ID:

              35258542

FILE SYSTEM SERIAL NUMBER

          16

INODE NUMBER

                   276

CORE FILE NAME

/fnsw/local/bin/core

PROGRAM NAME

HPII_val

STACK EXECUTION DISABLED

           0

COME FROM ADDRESS REGISTER

??

PROCESSOR ID

  hw_fru_id: N/A

  hw_cpu_id: N/A

ADDITIONAL INFORMATION

shm_snmp_ 1A0

??

??

pthread_k B4

Symptom Data

REPORTABLE

1

INTERNAL ERROR

0

SYMPTOM CODE

PCSS/SPI2 FLDS/HPII_val SIG/6 FLDS/shm_snmp_ VALU/1a0

---------------------------------------------------------------------------

LABEL:          CORE_DUMP

IDENTIFIER:     A924A5FC

Date/Time:       Fri May 22 11:09:07 EDT 2015

Sequence Number: 373

Machine Id:      00F6FB2C4C00

Node Id:       

Class:           S

Type:            PERM

WPAR:            Global

Resource Name:   SYSPROC

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes

SOFTWARE PROGRAM

User Causes

USER GENERATED SIGNAL

        Recommended Actions

        CORRECT THEN RETRY

Failure Causes

SOFTWARE PROGRAM

        Recommended Actions

        RERUN THE APPLICATION PROGRAM

        IF PROBLEM PERSISTS THEN DO THE FOLLOWING

        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

SIGNAL NUMBER

           6

USER'S PROCESS ID:

              34144306

FILE SYSTEM SERIAL NUMBER

          16

INODE NUMBER

                   276

CORE FILE NAME

/fnsw/local/bin/core

PROGRAM NAME

HPII_val

STACK EXECUTION DISABLED

           0

COME FROM ADDRESS REGISTER

??

PROCESSOR ID

  hw_fru_id: N/A

  hw_cpu_id: N/A

ADDITIONAL INFORMATION

shm_snmp_ 1A0

??

??

pthread_k B4

Symptom Data

REPORTABLE

1

INTERNAL ERROR

0

SYMPTOM CODE

PCSS/SPI2 FLDS/HPII_val SIG/6 FLDS/shm_snmp_ VALU/1a0

---------------------------------------------------------------------------

LABEL:          J2_FS_FULL

IDENTIFIER:     F7FA22C9

Date/Time:       Wed May 20 20:44:08 EDT 2015

Sequence Number: 372

Machine Id:      00F6FB2C4C00

Node Id:       

Class:           O

Type:            INFO

WPAR:            Global

Resource Name:   SYSJ2

Description

UNABLE TO ALLOCATE SPACE IN FILE SYSTEM

Probable Causes

FILE SYSTEM FULL

        Recommended Actions

        INCREASE THE SIZE OF THE ASSOCIATED FILE SYSTEM

        REMOVE UNNECESSARY DATA FROM FILE SYSTEM

        USE FUSER UTILITY TO LOCATE UNLINKED FILES STILL REFERENCED

 

Detail Data

JFS2 MAJOR/MINOR DEVICE NUMBER

000A 0005

FILE SYSTEM DEVICE AND MOUNT POINT

/dev/hd2, /usr

 

revarooo
Level 6
Employee

If you are having core dumps, get your OS team to investigate the process core dumping in this case HPII_val

 

As for the backup, run a manual backup and check the disk space on both the Master and the client as it's erroring.

Marianne
Level 6
Partner    VIP    Accredited Certified

There is this one on 20 May :

LABEL: J2_FS_FULL

IDENTIFIER: F7FA22C9

Date/Time: Wed May 20 20:44:08 EDT 2015

nbustarter380
Level 6

thanks ravarooo and Marianne for your responses,

ravarooo,

 I will check with the os sysadmins on the core dumps.

 

Marianne,

Thanks for pointing that out

Detail Data

JFS2 MAJOR/MINOR DEVICE NUMBER

000A 0005

FILE SYSTEM DEVICE AND MOUNT POINT

/dev/hd2, /usr

I did notice the below I am going to check with the sysadmins and see if more space can be given.

(root)/usr> df -g

Filesystem       GB   blocks    Free      %Used    Iused        %Iused                Mounted on

/dev/hd2               10.38           1.60        85%          84687       18%                      /usr

the above shows only 1.60GB free, do you know if there is a recommended amout of space the /usr which contains the /open/netbackup should have free?  Just checking because they may ask how much should be free 3GB, 4GB?

 

Best Regards

 

Marianne
Level 6
Partner    VIP    Accredited Certified
The 'disk full' message was more than a week ago. More or less the same time as status 155? df output is current situation. It probably looked different on the 20th. Speak to sysadmins.

Jaime_Vazquez
Level 6
Employee

Verify the date/time stamps of the Disk Full messages against jobs run on the NBU Master.

Check the core files created on the client.  Delete them after you have found their cause.  Core files can get rather large.  They are created in the same directory as the executable that failed.

Verify what your settings are for debug logs being created on the client.  The default is to keep them for 28 days, which is too large a value. This is especially true if the VERBOSE or DebugLevel configured for them is set too high. I tend to use a value of 4, a span of time sufficient to look at problems found over a long weekend. The files are written to the /usr/openv/logs or /usr/openv/netbackup/logs directories. It is possible they are taking an inordinate amount of space. Run this command to see how much space each directory/sub-directory is using:

du -m /usr

This should get space values, in MB, for /usr and for each sub-directory of /usr.  Use the "-s" option for file level results.