ZFS Pool Import Fails After Power Outage

July 15, 2010

The early summer storms have taken its toll on Alabama and UPS failures (and short-falls) have been popping-up all over. Add consolidated, shared storage to the equation and you have a recipe for potential data loss – at least this is what we’ve been seeing recently. Add JBOD’s with separate power rails and limited UPS life-time and/or no generator backup and you’ve got a recipe for potential data loss.

Even with ZFS pools, data integrity in a power event cannot be guaranteed – especially when employing “desktop” drives and RAID controllers with RAM cache and no BBU (or perhaps a “bad storage admin” that has managed to disable the ZIL). When this happens, NexentaStor (an other ZFS storage devices) may even show all members in the ZFS pool as “ONLINE” as if they are awaiting proper import. However, when an import is attempted (either automatically on reboot or manually) the pool fails to import.

From the command line, the suspect pool’s status might look like this:

root@NexentaStor:~# zpool import
pool: pool0
id: 710683863402427473
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
        pool0        ONLINE
          mirror-0   ONLINE
            c1t12d0  ONLINE
            c1t13d0  ONLINE
          mirror-1   ONLINE
            c1t14d0  ONLINE
            c1t15d0  ONLINE

Looks good, but the import it may fail like this:

root@NexentaStor:~# zpool import pool0
cannot import 'pool0': I/O error

Not good. This probably indicates that something is not right with the array. Let’s try to force the import and see what happens:

Nope. Now this is the point where most people start to get nervous, their neck tightens-up a bit and they begin to flip through a mental calendar of backup schedules and catalog backup repositories – I know I do. However, it’s the next one that makes most administrators really nervous when trying to “force” the import:

root@NexentaStor:~# zpool import -f pool0
pool: pool0
id: 710683863402427473
status: The pool metadata is corrupted and the pool cannot be opened.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
cannot import 'pool0': I/O error

Really not good. Did it really suggest going to backup? Ouch!.

In this case, something must have happened to corrupt metadata – perhaps the non-BBU cache on the RAID device when power failed. Expensive lesson learned? Not yet. The ZFS file system still presents you with options, namely “acceptable data loss” for the period of time accounted for in the RAID controller’s cache. Since ZFS writes data in transaction groups and transaction groups normally commit in 20-30 second intervals, that RAID controller’s lack of BBU puts some or all of that pending group at risk. Here’s how to tell by testing the forced import as if data loss was allowed:

root@NexentaStor:~# zpool import -nfF pool0
Would be able to return data to its state as of Fri May 7 10:14:32 2010.
Would discard approximately 30 seconds of transactions.

root@NexentaStor:~# zpool import -nfF pool0
WARNING: can't open objset for pool0

If the first output is acceptable, then proceeding without the “n” option will produce the desired effect by “rewinding” the last couple of transaction groups (read ignoring) and imported the “truncated” pool. The “import” option will report the exact number of “seconds” worth of data that cannot be restored. Depending on the bandwidth and utilization of your system, this could be very little data or several MB worth of transaction(s).

What to do about the second option? From the man pages on “zpool import” Sun/Oracle says the following:

zpool import [-o mntopts] [ -o property=value] … [-d dir | -c cachefile] [-D] [-f] [-R root] [-F [-n]]-a

Imports all pools found in the search directories. Identical to the previous command, except that all pools with a sufficient number of devices available are imported. Destroyed pools, pools that were previously destroyed with the “zpool destroy” command, will not be imported unless the-D option is specified.

-o mntopts

Comma-separated list of mount options to use when mounting datasets within the pool. See zfs(1M) for a description of dataset properties and mount options.

-o property=value

Sets the specified property on the imported pool. See the “Properties” section for more information on the available pool properties.

-c cachefile

Reads configuration from the given cachefile that was created with the “cachefile” pool property. This cachefile is used instead of searching for devices.

-d dir

Searches for devices or files in dir. The -d option can be specified multiple times. This option is incompatible with the -c option.

-D

Imports destroyed pools only. The -f option is also required.

-f

Forces import, even if the pool appears to be potentially active.

-F

Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.

-a

Searches for and imports all pools found.

-R root

Sets the “cachefile” property to “none” and the “altroot” property to “root”.

-n

Used with the -F recovery option. Determines whether a non-importable pool can be made importable again, but does not actually perform the pool recovery. For more details about pool recovery mode, see the -F option, above.

No real help here. What the documentation omits is the “-X” option. This option is only valid with the “-F” recovery mode setting, however it is NOT well documented suffice to say it is the last resort before acquiescing to real problem solving… Assuming the standard recovery mode “depth” of transaction replay is not quite enough to get you over the hump, the “-X” option gives you an “extended replay” by seemingly providing a scrub-like search through the transaction groups (read “potentially time consuming”) until it arrives at the last reliable transaction group in the dataset.

Lessons to be learned from this excursion into pool recovery are as follows:

Enterprise SAS good; desktop SATA could be a trap
Redundant Power + UPS + Generator = Protected; Anything else = Risk
SAS/RAID Controller + Cache + BBU = Fast; SAS/RAID Controller + Cache – BBU = Train Wreck

The data integrity functions in ZFS are solid when used appropriately. When architecting your HOME/SOHO/SMB NAS appliance, pay attention to the hidden risks of “promised performance” that may walk you down the plank towards a tape backup (or resume writing) event. Better to leave the 5-15% performance benefit on the table or purchase adequate BBU/UPS/Generator resources to supplant your system in worst-case events. In complex environments, a pending power loss can be properly mitigated through management supervisors and clever scripts: turning down resources in advance of total failure. How valuable is your data???

Posted in In-the-Lab, Nexenta, Open Source Storage, Virtual Storage | Tagged cannot import, extended recovery, failed jbod, force, i/o error, nexenta, nexentastor, non-importable pool, power loss, recovery mode, zfs, zpool import, zpool import -nfFX, zpool list |

3 comments

YOU SAVED MY LIFE!!! Thank you for the great post

LikeLiked by 1 person
by Donny August 25, 2011 at 12:50 am
- Your welcome, and glad it prevented a “resume writing event” for you 😉
  
  After re-reading the article myself it occurred to me that some may have been left with the impression that ZFS systems are at greater risk for data loss than other storage technologies: that’s simply not the case. In all of my professional work in the lab and in the field, I’ve never encountered a data loss event more tragic than a couple of transaction groups. In such cases where transaction groups were lost, if was completely due to the power failure plus unsafe caching condition as described in the blog.
  
  To be more plain: any storage system puts data at risk when acknowledged writes are not committed to non-volatile storage. This happens in some SATA disks that signal back to the controller that a cache-pending commit is actually written. As disk caches grow larger and larger, the time it takes to flush the cache to NV media can exceed the residual power in the supplies and on-board electronics. When this happens, file systems become inconsistent. To ZFS’ credit, it has a brilliant way to recognize and recover (as you’ve now experienced.)
  
  As my friend Richard Elling once said, “I trust ZFS to keep my most important data set safe and intact: Mrs Elling’s digital photos!” Likewise, all of the MacMillan family photos live in ZFS space – it’s just that reliable…
  
  LikeLiked by 1 person
  by Collin C MacMillan August 25, 2011 at 8:07 am
Wow, you really save me bacon with this post, thank you!!!

LikeLike
by James August 11, 2012 at 5:34 pm

Comments are closed.

SolutionOriented Blog