cancel
Showing results for 
Search instead for 
Did you mean: 

6.5.2 Changes Coming Soon

TimBurlowski
Moderator
Moderator
Employee Accredited

In the next few weeks I’ll be previewing some of the new things in NetBackup 6.5.2. I am purposely highlighting a few things which I think are neat but might ordinarily not make the press release level.

The first item is PEM. PEM was a component introduced in 6.0 to replace the NetBackup scheduler bpsched. In 6.5.2 there is a newly minted version of PEM. In order to introduce the new PEM, I interviewed with one of the lead engineers, Ray Streckert.


Q) Ray, tell us a little about what you in engineering and your background with NetBackup.

A) I came to Veritas in 2000 with 20 years of experience developing mainframe diagnostic, factory automation, and consumer software. I started my career at Veritas in the NetBackup Windows MFC GUI group and eventually became the team lead for the media and device management area of the GUI. I lead a project to develop the initial release of the NetBackup session layer (nbsl) that is to provide an interface to NetBackup for all user interfaces. I later took a position as the component group owner for NetBackup infrastructure. This area includes the artifact generator, service container, persistence manager, event manager, logging, and common libraries as well as a NetBackup focal point with the Symantec infrastructure group for corporate wide common components. I was then asked to take on the ownership of the policy execution manager (nbpem) component which was in the process of being rewritten. I have taken this on this responsibility with the help of an excellent team of dedicated engineers. I am now the component group owner of NetBackup systems management which includes the policy execution manager, the job manager, and the restore portion of the request daemon (bprd).

Q) I'm interviewing you today mainly to talk about the NetBackup Policy Execution Manager, aka PEM. What does PEM do?

A) PEM is responsible for scheduling NetBackup jobs based on due time taking into account failure history and open schedule windows. It selects policies to match user backup requests. It divides jobs into multiple child jobs and applies retry logic for failed jobs. It determines the need, timing and deletion of snapshots for complex jobs and manages the NetBackup session concept. It also is responsible for starting media cleanup.

Q) Why did we choose to re-architect PEM?

A) There were several reasons that convinced the team that something more than just minor refactoring was required. Most of these issues seemed to have themes around scalability, stability, and extensibility. PEM now has an infrastructure that enforces concepts like more granular locking. There is no longer a concept of a worklist that PEM works on, with multiple threads contending to get a lock on the worklist. Events are now used in PEM to signal completion of an operation, instead of letting threads of execution block until that operation completes. The framework that is used now in PEM is independent from the business logic. PEM makes extensive use of reference counted object to better allow reliable, asynchronous processing, cleaner shutdown and smaller memory footprint. PEM is also broken down into thirty six internal subsystems, each responsible for a smaller area of functionality. This allows us to reduce complexity and divide logic into smaller understandable units.

Q) What is different between the version of PEM we plan to ship in 6.5.2 and the previous incarnation?

A) From a functional standpoint, very little has changed. PEM II was designed to be a drop-in replacement for the existing PEM. We have made some changes in surrounding components that do allow the new PEM to do some things a little better. One of these areas of improvement is when PEM is restarted while jobs are active. We can do a much better job of determining what jobs are still active and synchronizing with the rest of an active NetBackup system. The new PEM should be able to scale much better, consume much less CPU, and have a comparable if not smaller memory footprint. We do keep some internal history which offsets some of the footprint, but the benefit to its supportability makes this worth the size.

Q) Was there a different test and design methodology used in re-architecting PEM?

A) Of course one of our biggest challenges was to make all of these improvements with no regressions in functionality. We recognized that this was a tall order so we did something a little different in order to test functional equivalence. In 6.5.1 we added some logging to job manager that logs all parameters when PEM submits a job. We could run a job with the previous version of PEM and the new version and compare the logs. We automated this comparison. The QE group introduced us to the concept of all-pairs testing. They created some policy constraints and generated scripts to create policies that tested many combinations of policy arguments with a low number of policies. We then combined this with our scripts for comparing PEM and the new PEM job submissions. This concept was a huge benefit when it came to functionally test all of the various policy types that can be configured in NetBackup.

Q) If I am running 6.0MP5 or 6.5.1 will I see any big changes?

A) You will not see any big changes.

Q) I've seen instances where people have had 1000's of policies and complained about the speed that the initial jobs start. Will that be any different for the new PEM?

A) On startup, PEM will now start jobs that should be run as soon as its policy has been read and all of its last backup data has been read. This is different than previous releases where PEM would resolve all last backup data for all policies before jobs would be submitted for execution. Many of the subsystems within the new PEM work from queues. These queues all honor job priority. Job priority is comprised of the policy priority and how overdue the job is. The exception to this is immediate backup requests (bpbackup). These requests are given a very high priority. So on startup, high priority jobs will be acted on before lower priority jobs.

Q) I know there are customers who don't actually use PEM to schedule jobs. Due to special scripting requirements they choose a third party scheduler to perform scripted backups. Will those customers see any difference?

A) The biggest difference will be at PEM startup time. PEM is still throttled by how quickly we can read last backup data. This is a known issue that is being worked. Since the new PEM handles all immediate backup requests as a high priority internally, these jobs should start almost instantly, even with a high number of policies and last backup data being queried.

Q) What can I expect from the new PEM when it comes to resource consumption like CPU & memory?

A) CPU usage should be considerably less and memory consumption should be very comparable to previous releases.

Q) PEM is closely aligned with the NetBackup Resource Manager(nbrb) and the NetBackup Job Manager (nbjm). Will those be changing at the same time?

A) There are no significant changes in nbrb or nbjm to accommodate PEM II.

Q) Does PEM include any special new capabilities or utilities. For example, I've always wanted to know more about what is supposed to run on a given day.

A) We have extended the functionality of nbpemreq. We have added the -jobs, -policies, and -subsystems options to nbpemreq. These options basically allow the NetBackup user to dump objects that are being maintained within the PEM process. Since this is a dump of the internals of the process, the output of these commands are guaranteed to change from release to release, but they are excellent tools to determine the internal state of PEM at any point in time. There is overhead in using these options so they should be used with care. The -subsystems option can dump information about any or all of the internal subsystems that make up PEM. The -jobs option can be used to dump detailed information about a recently executed NetBackup job. And the -policies option can be used to dump information about a policy, its policy/client task information including scheduling information, and a short subtask history of jobs that have recently executed.

Q) Shutdown and startup. Will I be able to shut down PEM independent of other NetBackup functions?

A) Yes and no. In general, yes you should be able to stop PEM, start PEM and not skip a beat. There are a couple situations where stopping PEM will cause a disruption. For instance, if a parent job has started and a request comes to PEM to start the child jobs, but a request is made to shut down PEM at that same instant. The parent job should fail and retry when PEM is restarted. The impact of PEM going down and coming backup up should be much smaller than previous releases.

Q) What is coming next for PEM? Are there any big enhancements planned?

A) Of course we always have new NetBackup features that we must accommodate like a new NetBackup Enterprise Vault policy type just as an example. One capability that we have been looking at is being able to backup an application. This is not an easy problem to solve in a way that can be applied in general terms, to backup any distributed application. This is still in the concept phase.


Thanks Ray for consenting to this interview especially since I know he has been plenty busy lately prepping for the 6.5.2 release.

Based on feedback from our customers on the initial release of PEM we’ve learned a lot and we are expecting that this new version in 6.5.2 will provide the stability and maintainability that all our customers expect from a product like NetBackup.