ZFS on Linux

From FlimzyWiki

Jump to: navigation, search

On June 19, 2007, I read an article on Slashdot entitled ZFS On Linux - Its' Alive!. I had heard about ZFS before, and was wishing it worked on Linux--specifically for its flexibility to use size-mismatched drives with data redundancy.

Contents

Test Setup

Before I converted any critical data to ZFS, I wanted to make sure it was reliable, usable, and fast "enough" for my needs. I scrounged up some parts, and threw together a ZFS test system.

Test Equipment Specs

  • A Pentium III, 733Mhz Gateway
  • 128mb RAM
  • A random DVD-RW Drive I found lying around
  • 3 random hard drives: a Western Digital 10gb, a Western Digital 20gb, and a Maxtor 40gb (all parallel ATA)

Test Software

I installed Debian Etch, from a nightly build dated 02/20/07--simply because that's the Etch install CD I had lying around on the floor, and I was too lazy to download and burn a current one at the time.

Test installation

This was a pretty straight-forward installation of Debian etch. The only "special" stuff I did was during the partitioning.

  1. I put a 100mb swap partition on each drive
  2. I created a 100mb /boot partition on hda (the 10gb drive)
  3. I created a 5GB / partition on hda
  4. I left the remaining space unpartitioned, with the intent of forming a ZFS pool later

I also unselected all the default tasks when it asked what software I wanted to install. I can install that later (and on a system I will actually use)

After the system booted for the first time, I installed SSH and some other basic tools I like to use, ran a dist-upgrade to my software up-to-date, then logged in remotely over the network.

First try with ZFS

ZFS Installation

To install the ZFS FUSE drivers for Debian, I used Bryan Donlan's instructions, with my own tweaks.

Add to /etc/apt/sources.list, then do an apt-get update.

deb-src http://www.fushizen.net/zfs-fuse ./
$ sudo apt-get install devscripts build-essential zlib1g-dev libfuse-dev scons debhelper dpatch xsltproc docbook-xsl fakeroot fuse-utils ucf
$ apt-get source zfs-fuse
$ cd zfs-fuse-0.4.0~beta1.hg20070509.2/
$ dpkg-buildpackage -rfakeroot -b -us -uc
$ cd ..
$ sudo dpkg -i zfs-fuse_0.4.0~beta1.hg20070509.2-1_i386.deb

ZFS Configuration

This part is amazingly simple. There are many options that can be used, refer to the zpool and zfs man pages. For my purpose, my primary goal is ease of expandability, and data redundancy, so here's what I did:

# mv /home /home2                                    # Move /home out of the way
# zpool create zfs /dev/hda4 /dev/hdc2 /dev/hdd2     # Create the pool
# zfs create -o copies=2 mountpoint=/home zfs/home   # Create the new '/home'
# mv /home2/* /home                                  # Copy everything in the old /home to its new location

Testing ZFS

Performance Tests

ZFS across all 3 drives. copies=2, compression=no, atime=on

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfstest        300M  5125  39  7852   9  2532   3  3988  29  5783   5 108.4   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1482   7  3470   8  1577   5   997   5  3063  10   658   2
zfstest,300M,5125,39,7852,9,2532,3,3988,29,5783,5,108.4,1,16,1482,7,3470,8,1577,5,997,5,3063,10,658,2

ZFS across all 3 drives. copies=2, compression=on, atime=on

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfstest        300M  6614  52 12524  13  5152   8  6550  49 14145  12 251.5   3
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1567   8  3137   9  1849   6  1605   7  3158   9   733   2
zfstest,300M,6614,52,12524,13,5152,8,6550,49,14145,12,251.5,3,16,1567,8,3137,9,1849,6,1605,7,3158,9,733,2

ZFS across all 3 drives. copies=1, compression=on, atime=off

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfstest        300M  6770  53 13875  15  6450  10  6195  46 13906  11 286.1   4
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1604   8  4103  10   781   3  1486   6  3759   8  2003   7
zfstest,300M,6770,53,13875,15,6450,10,6195,46,13906,11,286.1,4,16,1604,8,4103,10,781,3,1486,6,3759,8,2003,7

ZFS across all 3 drives. copies=1, compression=off, atime=off

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfstest        300M  6549  53 12666  13  7548  11  6778  50 15444  14 306.1   3
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1501   8  2729   5  1607   4  1473   6  4025  10  1834   5
zfstest,300M,6549,53,12666,13,7548,11,6778,50,15444,14,306.1,3,16,1501,8,2729,5,1607,4,1473,6,4025,10,1834,5

The 10gb drive alone

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfstest        300M 11775  96 22857  30  6232   7 11321  83 14318  10 146.5   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16736  98 +++++ +++ 26092 100 17558  99 +++++ +++ 25727  99
zfstest,300M,11775,96,22857,30,6232,7,11321,83,14318,10,146.5,1,16,16736,98,+++++,+++,26092,100,17558,99,+++++,+++,25727,99

Fault Tolerance Tests

After copying, deleting, and moving a bunch of data to my new ZFS filesystem, I decided to test its fault-tolerance.

During most of my data copying, I had 'copies=2' set, so I figured I should be able to remove any single drive from my system, and still have a working copy of my data. The first drive I removed from the system was the 40gb drive, leaving the 20gb, and the last half of the 10gb. After booting, I ran zpool status:

  pool: tank
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        UNAVAIL      0     0     0  insufficient replicas
          hda4      ONLINE       0     0     0
          hdc2      ONLINE       0     0     0
          hdd2      UNAVAIL      0     0     0  cannot open

So I re-attached the 40gb drive, and removed the 20gb drive. This time:

  pool: tank
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        UNAVAIL      0     0     0  insufficient replicas
          hda4      ONLINE       0     0     0
          hdc2      UNAVAIL      0     0     0  cannot open
          hdd2      ONLINE       0     0     0

Apparently, ZFS doesn't like it when one of it's underlying block devices goes away. I can't really blame it; but I thought that was one of the things ZFS was supposed to be good at coping with.

Next, I tried zeroing the 5gb partition on the 10gb drive (I could remove it from the system, since the OS booted off of the same drive). I ran 'dd' for a couple of seconds, then hit Ctrl-C:

# dd if=/dev/zero of=/dev/hda4
286661+0 records in
286661+0 records out
146770432 bytes (147 MB) copied, 5.25475 seconds, 27.9 MB/s

I then tried creating a tarball of some of the filze on my ZFS (knowing that would be one way to read them all).

When I tried to copy data that had been written with 'copies=1' set, I got a number of errors similar to:

Read error at byte 0, while reading 7680 bytes: Input/output error

I then ran zpool status again:

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0    70
          hda4      ONLINE       0     0    70
          hdc2      ONLINE       0     0     0
          hdd2      ONLINE       0     0     0

errors: 15 data errors, use '-v' for a list

Then I tried again on some data that had been copied with the 'copies=2' setting in place. I got no errors on the application level, but zpool status showed:

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0   581
          hda4      ONLINE       0     0   581
          hdc2      ONLINE       0     0     0
          hdd2      ONLINE       0     0     0

errors: 15 data errors, use '-v' for a list

Notice the same number of errors, but the checksum column for hda4 went up. It looks like it found and corrected the problems while reading.

Personal tools