Good article from Anthony Vandewerdt (thank you !) on testing V7000 Node canister failing.
I have received this question several times, so it’s clearly something people are interested in.
The Storwize V7000 has two controllers known as node canisters. It’s an active/active storage controller, in that both node canisters are processing I/O at any time and any volume can be happily accessed via either node canister.
The question then gets asked: what happens if a node canister fails and can I test this? The answer to the question of failure is that the second node canister will handle all the I/O on its own. Your host multipathing driver will switch to the remaining paths and life will go on. We know this works because doing a firmware upgrade takes one node canister offline at a time, so if you have already done a firmware update, then you have already tested node canister fail over. But what if you want to test this discretely? There are four ways:
- Walk up to the machine and physically pull out a node canister. This is a bit extreme and is NOT recommended.
- Power off a node canister using the CLI (using the satask stopnode command). This will work for the purposes of testing node failure, but the only way to power on the node canister is to pull it out and reinsert it. This is again a bit extreme and is not recommended. This is also different to an SVC, since each SVC has it’s own power on/off button.
- Use the CLI to remove one node from the I/O group (using the svctask rmnodecommand). This works on an SVC because the nodes are physically separate. On a Storwize V7000 the nodes live in the same enclosure and a candidate node will immediately be added back to the cluster, so as a test this is not that helpful.
- Place one node into service state and leave it there will you check all your hosts. This is my recommended method.