To give the background, customer with 3000+ clients with close to 6000+ jobs running on backup window was a complex setup, with D2D implementation in place. Having said, that the management with no centralized management thus depend upon the client failure response was the workaround to minimize the failure in notification. There were various challenges in management with the existing software, but list which are in top of my head are as follows before talking as to why we had gone ahead with Symantec.
Distributed backups, with no centralized management
One time full backup, then incremental backups - thus causing the data to be distributed and huge delay in restore time
High restore failure rate
Space consumption was high , distributed and duplicity
Complex troubleshooting approach
Taking all these things into considerations we have looked into multiple backup softwares, comparisons were done based on cost, ease of implementation, maintenance, transition time.
We had the experience of working on NetBackup, to overcome these constraints we opted for NetBackup Implementation one of the prime reason being their sheer market leader at that point of time and not many were part of the enterprise backup management arena:
Centralized management and ease of operations were the key aspects which were considered about 4+ years back with many advance options were not even considered :) and top ten reasons to go with it is
Comprehensive Disk-based data protection
Unparalleled Protection of VMware Environments
Reduced network and storage requirements for backup
Secure backup data
Centralized and global management of your data protection environment
LAN-free backup for any application
Continuous Data Protection
Complete system recovery within 15 minutes
Service level management and reporting
Once the product is decided, questions were raised for this transition
How are we going to implement? Includes NetBackup server building, moving the tape libraries, client installations.
How to reuse the Medias?
For implementation we had to really make a list of servers with OS, Sizes, Application details and particular requirements, once everything is populated we planned to do this in phases, first by building up a NetBackup master server and every week before firing a full backup our plan is move a set of clients from TSM to NetBackup, so in this process client push over the network has helped us a lot.
Existing Medias? As they send Medias to offsite for storage, once the retention period is completed we have to reuse the media, Net Backup’s capability of overwriting the media used any other vendor resolved this.
There were lots of huddle once this phase was started and We had to really come up with a charter which would lead to server space more that 600+GB was configured as SAN Media Servers thus overcoming one of the most common issues spread across 10% of the servers, but we had an hiccup in doing this as most of the site is populated with AIX boxes, which we thought of going for an AIX master server and divert all these SAN media servers and certain clients to that, where in testing we identified that certain Qlogic cards on these servers will not support in configuring robotic library, where we can see the drives, but we cannot the control the robot and this we got it confirmed from support, so as workaround we had gone for windows master server , implemented SSO and made the master server as robotic control host.
Then issues like EMM server going down; EMM is not getting updated properly, were seen as Symantec NetBackup has changed the code with 6.X, as it has introduced the concept of EMM server, which manages the media and storage units related tasks, now if a robotic library failed suddenly, EMM use to get panic and have the registries changed and also when space on the master server is very low due to the unified logs then all these lead to EMM down. Some more like, administrative console hang; NOM not getting updated properly; all these are sorted out with the help of support
One of the main issue, one day all jobs starting failing with error code 13 and we had to raise support call to check on this, after a very long diagnosis, the support has identified as it was McAfee which was stopping the services, this popped up as the other day our anti-virus admin has pushed the patch, so finally we have to follow a workaround given by support and continue backups and it took 4 months to complete the movement, now it goes too smooth making it one the best transition performed.
From all the experiences, before moving up planning is very important for any migration.
Collect your complete environment data and analyze.
Go through product documentations, whitepapers, check what features would help in addressing your issues; you will come to know what would be your expenditure on the software.
Count the additional hardware required for migration.
Divide the complete project in phases; get ready for the initial troubles, allocate extra days for troubleshooting.
Before moving into production; go for a test setup and check the functionalities as product performance change with environment, this would give you chance to tune it.
Take maximum help from support; support plays good role in smooth Transitions.
Netbackup implementation has eased up most of the caveat that we faced in earlier one and to list below are the few...
Centralized backup management, ensuring the servers spreads across globally had single view for ease of administration and operations
Based on the business needs the backup retention was implemented as per needs
As the backup retention was customized the restore failure rate got drastically down as data spread got minimized into limited medium ( tape / disk / vault )
Due to retention and backup type ( DINC , CINC ) the space utilization was controlled which was not happening in the earlier.
Able to distribute remote software and software updated from a centralized host on all platforms