ZFS on Linux
From FlimzyWiki
On June 19, 2007, I read an article on Slashdot entitled ZFS On Linux - Its' Alive!. I had heard about ZFS before, and was wishing it worked on Linux--specifically for its flexibility to use size-mismatched drives with data redundancy.
Contents |
Test Setup
Before I converted any critical data to ZFS, I wanted to make sure it was reliable, usable, and fast "enough" for my needs. I scrounged up some parts, and threw together a ZFS test system.
Test Equipment Specs
- A Pentium III, 733Mhz Gateway
- 128mb RAM
- A random DVD-RW Drive I found lying around
- 3 random hard drives: a Western Digital 10gb, a Western Digital 20gb, and a Maxtor 40gb (all parallel ATA)
Test Software
I installed Debian Etch, from a nightly build dated 02/20/07--simply because that's the Etch install CD I had lying around on the floor, and I was too lazy to download and burn a current one at the time.
Test installation
This was a pretty straight-forward installation of Debian etch. The only "special" stuff I did was during the partitioning.
- I put a 100mb swap partition on each drive
- I created a 100mb /boot partition on hda (the 10gb drive)
- I created a 5GB / partition on hda
- I left the remaining space unpartitioned, with the intent of forming a ZFS pool later
I also unselected all the default tasks when it asked what software I wanted to install. I can install that later (and on a system I will actually use)
After the system booted for the first time, I installed SSH and some other basic tools I like to use, ran a dist-upgrade to my software up-to-date, then logged in remotely over the network.
First try with ZFS
ZFS Installation
To install the ZFS FUSE drivers for Debian, I used Bryan Donlan's instructions, with my own tweaks.
Add to /etc/apt/sources.list, then do an apt-get update.
deb-src http://www.fushizen.net/zfs-fuse ./
$ sudo apt-get install devscripts build-essential zlib1g-dev libfuse-dev scons debhelper dpatch xsltproc docbook-xsl fakeroot fuse-utils ucf $ apt-get source zfs-fuse $ cd zfs-fuse-0.4.0~beta1.hg20070509.2/ $ dpkg-buildpackage -rfakeroot -b -us -uc $ cd .. $ sudo dpkg -i zfs-fuse_0.4.0~beta1.hg20070509.2-1_i386.deb
ZFS Configuration
This part is amazingly simple. There are many options that can be used, refer to the zpool and zfs man pages. For my purpose, my primary goal is ease of expandability, and data redundancy, so here's what I did:
# mv /home /home2 # Move /home out of the way # zpool create zfs /dev/hda4 /dev/hdc2 /dev/hdd2 # Create the pool # zfs create -o copies=2 mountpoint=/home zfs/home # Create the new '/home' # mv /home2/* /home # Copy everything in the old /home to its new location
Testing ZFS
Performance Tests
ZFS across all 3 drives. copies=2, compression=no, atime=on
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfstest 300M 5125 39 7852 9 2532 3 3988 29 5783 5 108.4 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1482 7 3470 8 1577 5 997 5 3063 10 658 2
zfstest,300M,5125,39,7852,9,2532,3,3988,29,5783,5,108.4,1,16,1482,7,3470,8,1577,5,997,5,3063,10,658,2
ZFS across all 3 drives. copies=2, compression=on, atime=on
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfstest 300M 6614 52 12524 13 5152 8 6550 49 14145 12 251.5 3
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1567 8 3137 9 1849 6 1605 7 3158 9 733 2
zfstest,300M,6614,52,12524,13,5152,8,6550,49,14145,12,251.5,3,16,1567,8,3137,9,1849,6,1605,7,3158,9,733,2
ZFS across all 3 drives. copies=1, compression=on, atime=off
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfstest 300M 6770 53 13875 15 6450 10 6195 46 13906 11 286.1 4
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1604 8 4103 10 781 3 1486 6 3759 8 2003 7
zfstest,300M,6770,53,13875,15,6450,10,6195,46,13906,11,286.1,4,16,1604,8,4103,10,781,3,1486,6,3759,8,2003,7
ZFS across all 3 drives. copies=1, compression=off, atime=off
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfstest 300M 6549 53 12666 13 7548 11 6778 50 15444 14 306.1 3
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1501 8 2729 5 1607 4 1473 6 4025 10 1834 5
zfstest,300M,6549,53,12666,13,7548,11,6778,50,15444,14,306.1,3,16,1501,8,2729,5,1607,4,1473,6,4025,10,1834,5
The 10gb drive alone
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfstest 300M 11775 96 22857 30 6232 7 11321 83 14318 10 146.5 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 16736 98 +++++ +++ 26092 100 17558 99 +++++ +++ 25727 99
zfstest,300M,11775,96,22857,30,6232,7,11321,83,14318,10,146.5,1,16,16736,98,+++++,+++,26092,100,17558,99,+++++,+++,25727,99
Fault Tolerance Tests
After copying, deleting, and moving a bunch of data to my new ZFS filesystem, I decided to test its fault-tolerance.
During most of my data copying, I had 'copies=2' set, so I figured I should be able to remove any single drive from my system, and still have a working copy of my data. The first drive I removed from the system was the 40gb drive, leaving the 20gb, and the last half of the 10gb. After booting, I ran zpool status:
pool: tank
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
hda4 ONLINE 0 0 0
hdc2 ONLINE 0 0 0
hdd2 UNAVAIL 0 0 0 cannot open
So I re-attached the 40gb drive, and removed the 20gb drive. This time:
pool: tank
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
hda4 ONLINE 0 0 0
hdc2 UNAVAIL 0 0 0 cannot open
hdd2 ONLINE 0 0 0
Apparently, ZFS doesn't like it when one of it's underlying block devices goes away. I can't really blame it; but I thought that was one of the things ZFS was supposed to be good at coping with.
Next, I tried zeroing the 5gb partition on the 10gb drive (I could remove it from the system, since the OS booted off of the same drive). I ran 'dd' for a couple of seconds, then hit Ctrl-C:
# dd if=/dev/zero of=/dev/hda4 286661+0 records in 286661+0 records out 146770432 bytes (147 MB) copied, 5.25475 seconds, 27.9 MB/s
I then tried creating a tarball of some of the filze on my ZFS (knowing that would be one way to read them all).
When I tried to copy data that had been written with 'copies=1' set, I got a number of errors similar to:
Read error at byte 0, while reading 7680 bytes: Input/output error
I then ran zpool status again:
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 70
hda4 ONLINE 0 0 70
hdc2 ONLINE 0 0 0
hdd2 ONLINE 0 0 0
errors: 15 data errors, use '-v' for a list
Then I tried again on some data that had been copied with the 'copies=2' setting in place. I got no errors on the application level, but zpool status showed:
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 581
hda4 ONLINE 0 0 581
hdc2 ONLINE 0 0 0
hdd2 ONLINE 0 0 0
errors: 15 data errors, use '-v' for a list
Notice the same number of errors, but the checksum column for hda4 went up. It looks like it found and corrected the problems while reading.

