cancel
Showing results for 
Search instead for 
Did you mean: 

problems encountered during setup of shared memory (89)

ksurya1487
Level 4

Guys,

i need help to resolve this issue,

we see backups failing with Error 89 , we followed this artcle and did fine tuning , Still backups are failing..!!!

http://www.symantec.com/business/support/index?page=content&id=TECH62633

My observation

All storage units are set to 40 concurrent jobs and max 30 jos are running , its failing with 89 immediatly when the job count goes beyond 30 or 32.!!

so now i limited my jobs to 25 only, i could see non of the backups are failing, we have 3 media servers and could see same issues on all the three media servers

can you please help us on what finetuning more to be done..!!

Netbackup 7.5.0.6

DD boost beeing used for backup

DD os 5.2.2

Attached hardware Spec

Error

03/13/2014 01:14:48 - Info nbjm (pid=11598) starting backup job (jobid=2353901) for client xxxxx1-bka, policy win.all.prd.xxxxxxx, schedule inc.tape
03/13/2014 01:14:48 - Info nbjm (pid=11598) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=2353901, request id:{65AFB866-AA6E-11E3-8EF4-00144FC9449C})
03/13/2014 01:14:48 - requesting resource xxxxxxx-prod
03/13/2014 01:14:48 - requesting resource xxxxxx.NBU_CLIENT.MAXJOBS.xxxxxxx-bka
03/13/2014 01:14:48 - requesting resource xxxxx.NBU_POLICY.MAXJOBS.win.all.prd.xxxxxxx
03/13/2014 01:14:58 - awaiting resource xxxxxxx-prod. Maximum job count has been reached for the storage unit.
03/13/2014 01:16:17 - granted resource  xxxxx.NBU_CLIENT.MAXJOBS.xxxxxxx-bka
03/13/2014 01:16:17 - granted resource  xxxxx.NBU_POLICY.MAXJOBS.win.all.prd.xxxxxxx
03/13/2014 01:16:17 - granted resource  MediaID=@aaaau;DiskVolume=xxxxxxx-xxxxxxx;DiskPool=xxxxxxx-xxxxxxx;Path=xxxxxxx-xxxxxxx;StorageServer=xxxxxxx;MediaServer=xxxxxxx
03/13/2014 01:16:17 - granted resource  xxxxxxx-xxxxxxx
03/13/2014 01:16:17 - estimated 32978712 kbytes needed
03/13/2014 01:16:17 - Info nbjm (pid=11598) started backup (backupid=xxxxxxx-bka_1394687777) job for client xxxxxxx-bka, policy win.all.prd.xxxxxxx, schedule inc.tape on storage unit xxxxxxx-xxxxxxx
03/13/2014 01:16:19 - started process bpbrm (pid=8781)
03/13/2014 01:16:22 - end writing
03/13/2014 01:58:08 - Info bpbrm (pid=8781) xxxxxxx-bka is the host to backup data from
03/13/2014 01:58:08 - Info bpbrm (pid=8781) reading file list from client
03/13/2014 01:58:08 - Error bpbrm (pid=8781) Could not get shared memory for bpbrm child process communication, No space left on device (28)
03/13/2014 01:58:09 - Info bpbkar (pid=0) done. status: 89: problems encountered during setup of shared memory
problems encountered during setup of shared memory  (89)

 

 

5 REPLIES 5

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

So, what did you set max-shm-memory and max-shm-ids to?

Another TN about this error: http://www.symantec.com/docs/TECH187992

ksurya1487
Level 4

Thanks Marianne

We set to 20GB , still backups seems to be failing

i could see the problem seems to be resolved after changing NUMBER_DATA_BUFFERS

it was set to 128 previously, i just renamed that file, all backups were running with 30 buffer files and am able to run 50+ jobs in the media server

i just want to know whats this value???  will this value change backup speed?????

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

The logic behind these performance tuning parameters and how they use shared memory is explained in this  o-l-d TN:

NET_BUFFER_SZ, SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS - how they work and how to configure them
http://www.symantec.com/docs/TECH1724

The defaults have changed in the meantime and is also explained in the performance tuning guide:
http://www.symantec.com/docs/TECH62317 
This guide also explains how to check effect of buffer sizes and numbers on performance in bptm log on media servers.

Seems you may need another media server to share the backup load?

Nicolai
Moderator
Moderator
Partner    VIP   

Backup to disk wont be impacted at lot but tape backup will.

In Netbackup both NUMBER_DATA_BUFFERS & SIZE_DATA_BUFFERS has a great impact on backup speed. If you think of bucket of water SIZE_DATA_BUFFERS decide how large each single bucket of water can be (5/10/20 liters) and NUMBER_DATA_BUFFERS decides how may buckets of what you can fill. 

All incoming backup data are stored temporary in a bucket.  A tape drive can consume huge amount of buckets and not having a full bucket  means the tape drive will stop writing wait for a new bucket and then start writing again. A disk is a random access media it hasn't the same impact.

Best solutions is to fix the shared memory constrain. You need to investigate kernel setting some more.

It should not be a issues going beyond 32 concurrent backup. Not from a memory perspective at least.

ksurya1487
Level 4

Hi,

 

Thanks for your reply,

 

Still am getting 600+ failures in a week, can you suggest me what else i can do for finetuning?

Sample failed Jobs

14:26:05.843 [21226] <2> write_data: received first buffer (262144 bytes), begin writing data
14:26:09.381 [21291] <2> bptm: INITIATING (VERBOSE = 0): -w -c arb-papp009 -dpath daysolbkpp002-nadaydd02 -stunit daysolbkpp002-nadaydd02 -cl fs.all.NielsenAudio.nadaydd02 -bt 1394906626 -b arb-papp009_1394906626 -st 0 -cj 1 -reqid -1394668221 -jm -brm -hostname arb-papp009 -ru root -rclnt arb-papp009 -rclnthostname arb-papp009 -rl 1 -rp 1209600 -sl full.tape -ct 0 -maxfrag 524288 -eari 0 -mediasvr daysolbkpp002 -no_callback -connect_options 0x01010100 -jobid 2360827 -jobgrpid 2360827 -masterversion 750000 -bpbrm_shm_id 452984858 -blks_per_buffer 512
14:26:10.747 [21291] <2> io_init: bpbrm_shm_id = 452984858, buffer address = 0xffffffff78700000
14:26:10.747 [21291] <2> io_init: using 262144 data buffer size
14:26:10.748 [21291] <2> io_init: using 30 data buffers
14:26:10.791 [21291] <16> create_shared_memory: could not allocate enough shared memory for backup buffers, No space left on device
14:26:32.818 [21301] <2> bptm: INITIATING (VERBOSE = 0): -w -c arb-ttally004 -dpath daysolbkpp002-nadaydd02 -stunit daysolbkpp002-nadaydd02 -cl fs.all.NielsenAudio.nadaydd02 -bt 1394906649 -b arb-ttally004_1394906649 -st 0 -cj 1 -reqid -1394668224 -jm -brm -hostname arb-ttally004 -ru root -rclnt arb-ttally004 -rclnthostname arb-ttally004 -rl 1 -rp 1209600 -sl full.tape -ct 0 -maxfrag 524288 -eari 0 -mediasvr daysolbkpp002 -no_callback -connect_options 0x01010100 -jobid 2360853 -jobgrpid 2360853 -masterversion 750000 -bpbrm_shm_id 1006632987 -blks_per_buffer 512
14:26:33.324 [21301] <2> io_init: bpbrm_shm_id = 1006632987, buffer address = 0xffffffff78700000
14:26:33.324 [21301] <2> io_init: using 262144 data buffer size
14:26:33.325 [21301] <2> io_init: using 30 data buffers
14:26:33.377 [21301] <16> create_shared_memory: could not allocate enough shared memory for backup buffers, No space left on device
14:26:38.218 [21305] <2> bptm: INITIATING (VERBOSE = 0): -w -c arb-dapp010 -dpath daysolbkpp002-nadaydd02 -stunit daysolbkpp002-nadaydd02 -cl fs.all.NielsenAudio.nadaydd02 -bt 1394906655 -b arb-dapp010_1394906655 -st 0 -cj 1 -reqid -1394668230 -jm -brm -hostname arb-dapp010 -ru root -rclnt arb-dapp010 -rclnthostname arb-dapp010 -rl 1 -rp 1209600 -sl full.tape -ct 0 -maxfrag 524288 -eari 0 -mediasvr daysolbkpp002 -no_callback -connect_options 0x01010100 -jobid 2360874 -jobgrpid 2360874 -masterversion 750000 -bpbrm_shm_id 1711276060 -blks_per_buffer 512
14:26:38.664 [21305] <2> io_init: bpbrm_shm_id = 1711276060, buffer address = 0xffffffff78700000
14:26:38.664 [21305] <2> io_init: using 262144 data buffer size
14:26:38.665 [21305] <2> io_init: using 30 data buffers
14:26:38.711 [21305] <16> create_shared_memory: could not allocate enough shared memory for backup buffers, No space left on device
^C


Successfull Jobs

23:53:59.142 [25139] <2> write_data: received first buffer (262144 bytes), begin writing data
23:54:03.364 [25142] <2> write_data: received first buffer (262144 bytes), begin writing data
23:54:07.212 [25148] <2> write_data: received first buffer (262144 bytes), begin writing data
23:54:11.679 [6020] <2> write_data: received first buffer (262144 bytes), begin writing data
23:54:34.264 [25121] <2> write_data: received first buffer (262144 bytes), begin writing data
23:54:51.684 [27063] <2> bptm: INITIATING (VERBOSE = 0): -w -c dayrheocru017-bka -dpath daysolbkpp002-nadaydd02 -stunit daysolbkpp002-nadaydd02 -cl fs.all.prd1.nadaydd02 -bt 1394940751 -b dayrheocru017-bka_1394940751 -st 0 -cj 1 -reqid -1394669549 -jm -brm -hostname dayrheocru017-bka -ru root -rclnt dayrheocru017-bka -rclnthostname dayrheocru017-bka -rl 1 -rp 1209600 -sl full.tape -ct 0 -maxfrag 524288 -eari 0 -mediasvr daysolbkpp002 -no_callback -connect_options 0x01010100 -jobid 2362190 -jobgrpid 2362190 -masterversion 750000 -bpbrm_shm_id 1392509151 -blks_per_buffer 512
23:54:52.425 [27063] <2> io_init: bpbrm_shm_id = 1392509151, buffer address = 0xffffffff78700000
23:54:52.425 [27063] <2> io_init: using 262144 data buffer size
23:54:52.426 [27063] <2> io_init: using 30 data buffers
23:54:52.426 [27063] <2> create_shared_memory: shm_size = 7865048, buffer address = 0xffffffff73c00000, buf control = 0xffffffff74380000, ready ptr = 0xffffffff743802d0
23:55:19.022 [27063] <2> write_data: received first buffer (262144 bytes), begin writing data
23:56:06.576 [21226] <2> write_data: received first buffer (262144 bytes), begin writing data
23:57:24.468 [20923] <2> write_data: received first buffer (262144 bytes), begin writing data
23:57:55.881 [5646] <2> write_data: received first buffer (262144 bytes), begin writing data
23:58:19.151 [25413] <2> write_data: received first buffer (262144 bytes), begin writing data