Post by Chris Mellor (thank you) over at El Reg
Some high-end NetApp FAS6000 arrays are suffering failures that cause them to halt and restart. NetApp is fixing the problem.
El Reg understands that several FAS6000 customers in Europe have discovered that their arrays stop working while under heavy load and abruptly restart. This interrupts their ability to satisfy data I/O requests from accessing servers and so slows down applications running on those machines.
These arrays use NetApp’s Flash Cache [PDF], a three-quarter-length PCIe x8 card containing NAND solid-state chips that form a read-write cache. This memory handles data I/O that would otherwise shift ten times more slowly if it had to be accessed from the array’s hard disk drives.
A FAS6000 array can have up to 6TB of flash cache spread across twelve 512GB cards or modules.
The cache has a customisable FPGA chip that controls the system and caching activities. Suspicion has fallen on this chip as the source of the problem.
Read on here