cancel
Showing results for 
Search instead for 
Did you mean: 

slow sharepoint backup

yguilloux
Level 4
Partner

Hello

I have an issue with a Sharepoint GRT backup with NBU 7.6.1.2.

The architecture is the following :

- Sharepoint 2010, 2 Front web servers, 2 application servers, 1 Clustered SQL Server

- all servers are virtual machines

- Service account used for legacy Network, legacy client and client services is domain admin + local admin and has the required 4 rights in local security policy

- DNS is OK (hosts has been also updated with short and long names - just in case)

- non-GRT backup is runing fine

- GRT backup is starting well, but after "some time" is slowing down ...

- I have enabled verbose loging on SQL server and the following messages appear in bpkar log file  :

21:54:06.290 [1708.7048] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
21:54:06.290 [1708.7048] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
21:54:06.290 [1708.7048] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
21:54:06.290 [1708.7048] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
21:54:06.290 [1708.7048] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1544] in failed cache ERR=10109 NAME=SVBLL-WI05Q SVC=bpspsserver
21:54:06.290 [1708.7048] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1267] vnet_cached_getaddrinfo_and_update() failed 6 0x6
21:54:06.290 [1708.7048] <8> init_remote_connect_recs: [vnet_connect.c:949] vnet_cached_getaddrinfo() failed, status=6, OS ret=10109, host=SVBLL-WI05Q, service=bpspsserver
21:54:06.290 [1708.7048] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 0 0x0
21:54:06.290 [1708.7048] <8> init_connect_recs: [vnet_connect.c:868] init_remote_connect_recs() failed 6 0x6
21:54:06.290 [1708.7048] <2> init_connect_recs: [vnet_connect.c:880] Ignoring some failures.  Success count 2 0x2
21:54:06.290 [1708.7048] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
21:54:06.304 [1708.7048] <2> vnet_pbxConnect: pbxConnectEx Succeeded
21:54:15.974 [1708.7048] <8> vnet_vnetd_service_socket: [vnet_vnetd.c:1995] VN_REQUEST_SERVICE_SOCKET 6 0x6
21:54:15.974 [1708.7048] <8> vnet_vnetd_service_socket: [vnet_vnetd.c:2009] service bpspsserver
21:54:16.484 [1708.7048] <2> do_pbx_service: [vnet_connect.c:2159] via PBX/via VNETD bpspsserver CONNECT FROM 10.225.64.101.64646 TO 10.225.64.106.1556 fd = 1280
21:54:16.485 [1708.7048] <2> async_connect: [vnet_connect.c:1691] connect async CONNECT FROM 10.225.64.101.64646 TO 10.225.64.106.1556 fd = 1280
21:54:16.485 [1708.7048] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO SVBLL-WI05Q 10.225.64.106 bpspsserver VIA pbx
21:54:16.485 [1708-7048] vnetdSendRequest     Sending message 0xf331
21:54:16.485 [1708-7048] vnetdProcessMessage     received a reply successfully for message 0xf331
21:54:20.175 [1708.7072] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
21:54:20.175 [1708.7072] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
21:54:20.175 [1708.7072] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
21:54:20.175 [1708.7072] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
21:54:20.176 [1708.7072] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1544] in failed cache ERR=10109 NAME=SVBLL-WI05Q SVC=bpspsserver
21:54:20.176 [1708.7072] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1267] vnet_cached_getaddrinfo_and_update() failed 6 0x6
21:54:20.176 [1708.7072] <8> init_remote_connect_recs: [vnet_connect.c:949] vnet_cached_getaddrinfo() failed, status=6, OS ret=10109, host=SVBLL-WI05Q, service=bpspsserver
21:54:20.176 [1708.7072] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 0 0x0
21:54:20.176 [1708.7072] <8> init_connect_recs: [vnet_connect.c:868] init_remote_connect_recs() failed 6 0x6
21:54:20.176 [1708.7072] <2> init_connect_recs: [vnet_connect.c:880] Ignoring some failures.  Success count 2 0x2
21:54:20.176 [1708.7072] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
21:54:20.190 [1708.7072] <2> vnet_pbxConnect: pbxConnectEx Succeeded
21:54:24.119 [1708.7072] <8> vnet_vnetd_service_socket: [vnet_vnetd.c:1995] VN_REQUEST_SERVICE_SOCKET 6 0x6
21:54:24.119 [1708.7072] <8> vnet_vnetd_service_socket: [vnet_vnetd.c:2009] service bpspsserver
21:54:24.725 [1708.7072] <2> do_pbx_service: [vnet_connect.c:2159] via PBX/via VNETD bpspsserver CONNECT FROM 10.225.64.101.64653 TO 10.225.64.106.1556 fd = 1280
21:54:24.725 [1708.7072] <2> async_connect: [vnet_connect.c:1691] connect async CONNECT FROM 10.225.64.101.64653 TO 10.225.64.106.1556 fd = 1280
21:54:24.725 [1708.7072] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO SVBLL-WI05Q 10.225.64.106 bpspsserver VIA pbx
21:54:24.726 [1708-7072] vnetdSendRequest     Sending message 0xf331
21:54:24.726 [1708-7072] vnetdProcessMessage     received a reply successfully for message 0xf331

I am worried (but maybe i am wrong) by vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1544] in failed cache ERR=10109 NAME=SVBLL-WI05Q

SVBLL-W05Q is the front web server used as client in the NBU policy.

Thanks in advance for any clue/help

Best regards

11 REPLIES 11

watsons
Level 6

GRT backup is starting well, but after "some time" is slowing down .

When you said this, did the job actually start reading or writing any data to storage? Check the byte counts? 

If it has started writing, the socket failure in the logs may not be relevant.

Check bptm logs on media server, does it tell you anything about "Kbytes transferred", see how fast/slow it is over time.

If farm backup (non-GRT) is fine, I don't see why GRT will be a "connection issue", ensure nbfsd are running fine on both media server and clients (frontend + backend), check nbfsd logs for more details

yguilloux
Level 4
Partner

Yes, actually the backup part is runing fine, all data seems to be written (regarding the DB size), only the GRT process seems slow on some the databases. After further investigations, only some DB seems to be slow...

I can't see any issue on NFS (i successfully restored files from a backup) and as I say some backups are good ...

I opened a call and i am working with symantec support to resolve this issue, but any help would be appreciated 

 

watsons
Level 6

If only some DB seem to be slower than the other, it could possibly be the number of files, that also explains why there were no problem during non-GRT because with that it does not go down to file-level.

A 1-million files backup will be much slower than a 1000 files backup. Applicable to all type of file backups.

yguilloux
Level 4
Partner

I have made some new tests : one  2 GB DB with 150 000 files ran in 9h15 , one 18 GB DB with 175 000 ran in 15 minutes ....

watsons
Level 6

Yes, that is exactly what I was talking about, notice the percentage of number of files over the total size, can impose a huge difference to the backup time. Try to cut down number of files (by placing them in zip files). 

If you don't have control over the number of files (since this is Sharepoint backup), usually what I do is inform the site owner/user that their backup is taking too long, and ask them to reduce number of files. They will do so if you let them know how a long backup can impact their access speed and risk of having no backup at all.

^_^

yguilloux
Level 4
Partner

thoses DBs are not huge, 9h for 150 000 is definively not a reasonnable backup time. And asking user to reduce the number of file is not for me the right way for a backup administrator to solve an issue !

There is maybe a sharepoint problem or a Netbackup one, so I have to find where is the issue  ...

Actually i am runing a new test : 780 MB DB, after 17h, 15 000 files have been processed !!!

kindy
Level 2

Same issue for me. Did you find a solution?

28/08/2015 14:19:26 - estimated 0 Kbytes needed
28/08/2015 14:19:26 - Info nbjm(pid=20833) started backup (backupid=server1_1440764366) job for client server1, policy SharePoint, schedule Full on storage unit MSDP
28/08/2015 14:19:29 - started process bpbrm (5260)
28/08/2015 14:19:33 - Info bpbrm(pid=5260) server1 is the host to backup data from     
28/08/2015 14:19:33 - Info bpbrm(pid=5260) reading file list for client        
28/08/2015 14:19:33 - connecting
28/08/2015 14:19:36 - Info bpbrm(pid=5260) starting bpbkar32 on client         
28/08/2015 14:19:36 - connected; connect time: 0:00:03
28/08/2015 14:19:47 - Info bpbkar32(pid=6856) Backup started           
28/08/2015 14:19:47 - Info bptm(pid=9816) start            
28/08/2015 14:19:47 - Info bptm(pid=9816) using 262144 data buffer size        
28/08/2015 14:19:47 - Info bptm(pid=9816) setting receive network buffer to 1049600 bytes      
28/08/2015 14:19:47 - Info bptm(pid=9816) using 30 data buffers         
28/08/2015 14:19:48 - Info bptm(pid=9816) start backup           
28/08/2015 14:19:49 - Info bpbkar32(pid=6856) change time comparison:<disabled>          
28/08/2015 14:19:49 - Info bpbkar32(pid=6856) archive bit processing:<enabled>          
28/08/2015 14:19:49 - Info bpbkar32(pid=6856) INF - waiting for mount mutex       
28/08/2015 14:19:50 - Info bptm(pid=9816) backup child process is pid 7228.10480       
28/08/2015 14:19:50 - Info bptm(pid=7228) start            
28/08/2015 14:19:51 - begin writing
28/08/2015 14:25:11 - Info bpbrm(pid=5260) from client server1: TRV - Starting granular backup processing for (Microsoft SharePoint Resources\Microsoft SharePoint Foundation Web Application\SharePoint - test\Content-DB 1 (serverSQL/SQL/WEB_APPLI)). This may take a while... 
28/08/2015 14:28:59 - Info bpbrm(pid=5260) from client server1: TRV - Successfully finished granular backup processing for (Microsoft SharePoint Resources\Microsoft SharePoint Foundation Web Application\SharePoint - test\Content-DB 1 (serverSQL/SQL/WEB_APPLI/xxxxx)). 
30/08/2015 08:42:53 - Info bptm(pid=9816) waited for full buffer 8587 times, delayed 9768784 times    
30/08/2015 08:42:54 - Critical bptm(pid=9816) Storage Server Error: (Storage server: PureDisk:MSDP) mtstrm_close_write_channel: Fatal error occured in Multi-Threaded Agent: Close Write Channel command failed: Cr_ErrnoException: Cannot get session server1_1440764366 because no such session is open V-454-96
30/08/2015 08:42:54 - Critical bptm(pid=9816) sts_close_handle failed: 2060018 file not found       
30/08/2015 08:42:54 - Error bptm(pid=9816) cannot write image to disk, media close failed with status 2060018  
30/08/2015 08:42:56 - Info bptm(pid=9816) EXITING with status 87 <----------        
30/08/2015 08:42:56 - Info eicilwp211(pid=9816) StorageServer=PureDisk:MSDP; Report=PDDO Stats (multi-threaded stream used) for (MSDP): scanned: 26817676 KB, CR sent: 16058002 KB, CR sent over FC: 0 KB, dedup: 40.1%, cache hits: 130 (54.6%)
30/08/2015 08:42:56 - Critical bptm(pid=9816) Storage Server Error: (Storage server: PureDisk:MSDP) mtstrm_close_session: Fatal error occured in Multi-Threaded Agent: Close Session command failed: Cr_ErrnoException: Cannot close session server1_1440764366 because no such session is open V-454-95
30/08/2015 08:42:57 - Critical bptm(pid=9816) sts_close_server failed: error 2060017 system call failed      
30/08/2015 08:42:59 - Error bpbrm(pid=5260) cannot send mail to xxxx        
30/08/2015 08:43:00 - Info bpbkar32(pid=6856) done. status: 87: media close error       
30/08/2015 08:43:00 - end writing; write time: 42:23:09
media close error(87)

The backup is done with a smaller database in the same farm.

Regards

Will_Restore
Level 6

Status 87 and Status 40 on long running Client side deduplication jobs across a slow connection

https://support.symantec.com/en_US/article.TECH206337.html

 

(this is another issue and should be moved to new thread)

 

kindy
Level 2

Status 87 after 40h for a litle database (30Go). Sometime, no error for 3 days.

GRT process seems slow.

Full backup is fine (not granular)

 

 

 

watsons
Level 6

Best to log a call with the support to sort this out, and if there is a solution please share it out. Thanks!

yguilloux
Level 4
Partner

I did it 10 days ago, support is still analysing tons of logs ... I will update my post when solution if found, thanks for trying to help !