Specific disk replacement procedure must be followed to prevent potential data loss in a dynamic disk pool – IBM System Storage

220px-High_voltage_warning.svg

Source

RETAIN tip: H211769

Symptom

When an optimal disk (that is, one that has not been marked failed) in a Dynamic Disk Pool is pulled and replaced with no host I/O occurring, the controller firmware will not start a reconstruction of data to the new disk.

When host I/O resumes to a Thin Provisioned Volume (TPV), it is possible that an internal failure will occur causing the TPV to fail and data to subsequently be lost. There is a very small possibility that the TPV would stay optimal after the data has already been lost.

When host I/O resumes to a standard Redundant Array of Independent Disks (RAID) volume, no notification will be apparent, but data will be lost.

If host I/O is in progress when the drive is pulled and replaced, then this issue will not occur.

Affected configurations

The system may be any of the following IBM servers:

  • IBM System Storage DCS3700 Storage Subsystem, type 1818, any model
  • IBM System Storage DCS3860 Storage Subsystem, type 1813, any model
  • IBM System Storage DS3512, type 1746, any model
  • IBM System Storage DS3524, type 1746, any model

This tip is not software specific.

This tip is not option specific.

The following system firmware level(s) are affected: controller firmware 7.84, 7.86

Solution

The fix for this issue is contained in controller firmware 7.84.53.00 and later releases and 7.86.39.00 and later releases.

These files are available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support’s Fix Central web page, at the following URL:

http://www.ibm.com/support/fixcentral/

Workaround

To avoid encountering this issue, ensure that before replacing a disk, the disk is first failed by using DS Storage Manager. Following the manual failure of the disk, its fault Light Emitting Diode (LED) should be lit, indicating the drive is safe to remove. Wait 60 seconds between pulling the failed disk and replacing it with a new disk.

See more details here

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s