cancel
Showing results for 
Search instead for 
Did you mean: 

Sfcache (SmartIO) volume Kstate DISABLED after SG switch over to alternate node

Brem_Belguebli
Level 3

Hello

We are currently testing sfcache (SmartIO) on a 2 nodes test cluster without CFS and after doing a SG switchover from one node to the other, the cache is no more used and volume appear in State: ENABLED and Kstate: DISABLED.

No way to reactivate it unless destroying the data (cached ) DG.

Any hints ?

Setup : RHEL 6.4 SFHA ENT 6.1 

 

Brem  

1 ACCEPTED SOLUTION

Accepted Solutions

Brem_Belguebli
Level 3

Hi Tony,

It works now, thanks to the Symantec support (we opened a case in parallel).

We added the new (6.1) flag ClearClone=1 on the DG definition in the main.cf, and the udid_mismatch and clone_disk are automatically cleared, thus making the cache active when failover occurs.

We do not expect to switch over the cache devices, as at the target they will be local to each node (FusionIO PCIe cards).

Regards

Brem

 

 

View solution in original post

12 REPLIES 12

TonyGriffiths
Level 6
Employee Accredited Certified

Hi,

Could you post the extracts of the VCS main.cf file, that shows the resources you are failing over (DG etc)

Also could you simmarise what devices you are using for the SMARTIO cache and which nodes they are located

 

thanks

tony

Brem_Belguebli
Level 3

Hello 

The servicegroup is made of a DG with a volume on top of which resides a vxfs FS and a IP address.

The whole servicegroup is switched over the alternate node.

For now we are testing the cache on regular disk storage, each node having a dedicated local cache area.

This is noted as working in the Smartio documentation.

Brem  

 

 

TonyGriffiths
Level 6
Employee Accredited Certified

Hi Brem,

So if I understood, you are currently testing the SMARTIO feature using regular SAN disk storage (non-shared).

The SMARTIO cache is intended to be used on fast host based flash devices that will provide a high speed cache. Using disk storage will not provide any real gain and may even be slower.

Is the testing to better understand how SMARTIO works ? or you intend to move this to a live/production state ?

thanks

tony

Brem_Belguebli
Level 3

Hi Tony,

You guessed right. Actually we plan to deploy new clusters with some local Flash PCIe for caching purposes. These machines are not yet deployed, and we wanted to anticipate a bit our integration on a test cluster (deployment of SFHA 6.1, how to configure the cache, observe the behaviour, etc...)

However, it would have made sense if we were using different tiers of storage (high end SAN for the cache and  cheaper/slower  and more capacitive for the data) .

So yes it is for testing purposes currently.

Brem  

 

TonyGriffiths
Level 6
Employee Accredited Certified

Thanks Brem, understood.

As for the failover aspect, is it the failover of the data disk group that you are having problems with ?

cheers

tony

Brem_Belguebli
Level 3

No, failover of the SG (including DG, volume and FS are fine) as well as fallback work fine .

The only thing is that the cache area is not used anymore (failover and fallback).

 Rgrds

Brem 

Clifford_Barcli
Level 3
Employee Accredited Certified

Hi Tony.

Are you working with Read caching or write-back caching?    You mention that you are NOT using CFS, but I wanted to make sure that you are testing Read Caching pre Chapter 2 of the SmartIO for Solid State Solutions Guide for Linux.

Brem_Belguebli
Level 3

Hello Clifford,

 

Actually it's Brem not Tony, who works with you.

We plan to use it for read caching only, as we need to maintain write consistency across sites.

 

@Tony, I think I could figure out what my problem is.

My data disk is replicated (htc agent), and when it fails over the remote site, the udid_mismatch and clone_disk flags are not cleared (actually we have a preonline trigger script for this but it's not working on this 6.1 new cluster).

Clearing manually the flags and disabling/enabling the cache for the volumes reactivates the cache.

Brem   

 

 

TonyGriffiths
Level 6
Employee Accredited Certified

Hi Brem,

Looks like you have handle on the issue. As you mentioned, the data diskgroup can failover in VCS like a traditional disk group. 

The SMARTIO cache device is local to a node and cannot failover/migrate to another node. If you failover the data disk group. the SMARTIO cache device will be left on the original node.

cheers

tony

Brem_Belguebli
Level 3

Hi Tony,

It works now, thanks to the Symantec support (we opened a case in parallel).

We added the new (6.1) flag ClearClone=1 on the DG definition in the main.cf, and the udid_mismatch and clone_disk are automatically cleared, thus making the cache active when failover occurs.

We do not expect to switch over the cache devices, as at the target they will be local to each node (FusionIO PCIe cards).

Regards

Brem

 

 

TonyGriffiths
Level 6
Employee Accredited Certified

Hi Brem,

Good to hear that

cheers

tony

RyanJancaitis
Level 4
Employee

Brem,

Glad to hear this got resolved.

For your local Fusion-IO devices, would it be useful to be able to migrate the cache over during a fail-over?

So when your app moved from node a to node b, the cache is pre-warmed.