cancel
Showing results for 
Search instead for 
Did you mean: 

node failure in puredisk pool

RexChen
Level 2

Assume:

I have 6 nodes netbackup 5020 and grouping it into single pool.

 

Question:

If one node is failed, is data still accessable or not?

I assume there are three cases when run into this kind of situtation.

a) Service stop

b) It doesn't impact the rest of data in other nodes

c) Whole data on 6 nodes are not accessable

 

 

 

thanks

Rex

1 ACCEPTED SOLUTION

Accepted Solutions

Chad_Wansing2
Level 5
Employee Accredited Certified

no, if a node fails, the storage pool will go offline.  the reason is that the nodes divide up responsibility for certain hash values.  Imagine if every hash began with a letter of the alphabet.  Then each node is responsible for some number of letters that it would answer back for.  The SPA makes sure the queries for the particular letters get routed to the appropriate node, then the node answers the requestor back directly.  If one of your nodes is offline, there's a whole area of the alphabet unaccounted for, so instead of that the SPA takes the pool to an offline state and notifies NBU.

 

All that being said, the hardware is carrier-grade stuff (literally NEBS3 telco graded) and the only single point of failure (besides the motherboard obviously) is the RAID controller, but the controller config is actually saved to the drives so that we can re-read the configuration onto a new card if the RAID controller has to be replaced.  For some of our larger customers, we've actually sold them an empty (of drives) chassis, and in the event of some kind of catastrophic loss of a box, you can pull all the drives from the appliance and (preferably keeping them in the same order) put them into a different chassis and get the node back online very quickly.

Hope this helps!

 

-Chad

View solution in original post

2 REPLIES 2

Chad_Wansing2
Level 5
Employee Accredited Certified

no, if a node fails, the storage pool will go offline.  the reason is that the nodes divide up responsibility for certain hash values.  Imagine if every hash began with a letter of the alphabet.  Then each node is responsible for some number of letters that it would answer back for.  The SPA makes sure the queries for the particular letters get routed to the appropriate node, then the node answers the requestor back directly.  If one of your nodes is offline, there's a whole area of the alphabet unaccounted for, so instead of that the SPA takes the pool to an offline state and notifies NBU.

 

All that being said, the hardware is carrier-grade stuff (literally NEBS3 telco graded) and the only single point of failure (besides the motherboard obviously) is the RAID controller, but the controller config is actually saved to the drives so that we can re-read the configuration onto a new card if the RAID controller has to be replaced.  For some of our larger customers, we've actually sold them an empty (of drives) chassis, and in the event of some kind of catastrophic loss of a box, you can pull all the drives from the appliance and (preferably keeping them in the same order) put them into a different chassis and get the node back online very quickly.

Hope this helps!

 

-Chad

RexChen
Level 2

Thanks for your kind reply.

 

Rex