ZFS (Zettabyte File System) was a file system made by Sun, and later acquired by Oracle who had bought them out. Initially Oracle was championing for BTRFS until they acquired Zfs. The are still funding for development into BTRFS though which feature set should be similar to ZFS but is years behind it because of slow development from having a stable release.
So what is so great about the ZFS file system ? Pros and cons ?
1. Scrubbing – bit rot protection
3. Enterprise ready in terms of size and performance (not really required with home servers)
1. Stuck with Solaris or FreeBSD
2. Need squids of RAM to make it run decently. (only true if you plan on using deduplication, if not, then this is a non issue)
3. No ability to dynamically grow a pool (Not strictly speaking true – though it’s more restrictive than mdadm. You can replace spindles with larger ones, and when the whole RAIDz set is upgraded, you get more space; alternatively you can add another RAIDz set of the same level to the existing pool of disks.)
Another summary of what is so great about ZFS and explaining the finer details regarding the benefits of it:
- ZFS eliminates the need for partitioning, fdisk, and volume managers like the Windows disk management console. It accomplishes this by collecting the system’s storage devices into a pool of heterogeneous storage. ZFS then creates zpoolswhich–like the partition–can be sized to fit a user’s needs. ZFS’ zpooling mechanism grants tremendous liquidity and management simplicity to the file system.
- ZFS supports storage capacities and file sizes so large that we’ll have to break the laws of physics to reach them. No fooling.
- ZFS’ block-level journaling means that every file on the system can be checked for integrity and rebuilt from parity.
- ZFS uses atomic transactions, which means multiple file operations are grouped, but only executed if the process can complete. This ensures data won’t get butchered if a file operation fails in the middle of the process. Transactional operations also make for low write overhead.
- ZFS uses the copy-on-write model for writing data. This means that ZFS writes new files or file modifications to disk free space and then changes the pointers in the file system to refer to the new location for the data. This ensures that data will never be corrupted by an overwrite that fails while executing in the place of active data.
These features just brush the surface of the features ZFS provides. Built-in RAID levels, self-healing algorithms, dynamic block sizes and file system snapshots all make for tremendous robustness and reliability. All of this is not to say that a desktop implementation of ZFS would not be without its issues, but it’s certainly the most forward-thinking file system to crop up in quite some time.
ZFS includes software RAID called RAID-Z. Regular RAID-Z writes one parity block like RAID-5, while RAID-Z2 writes two parity blocks like RAID-6. Both are available in the FreeNAS Volumes screen when creating a new volume (assuming there are enough disks available). More complex parity arrangements are also possible.
- If you already know Linux then stick with Mdadm.
- If you already know freebsd then go for zfs.
- If you want to handle craploads of hardware support issues go solaris with ZFS.
For a full coverage what ZFS full features and abilities are, you can read it here on wikipedia.
Wondering if ZFS is ported yet to Linux ? Well luckily a third-party is doing just that. However big problem is the licensing conflict that unless relaxed prevents it from being in the mainline kernel for Linux.
However it’s very unlikely the licensing issue will ever go away, as ZFS was done under the Sun CDDL. Sun no longer exists, and frankly, nobody sees Oracle saying “sure, we will release ZFS from the license it is under…”. So unless somebody is serious about paying the royalty fee to license ZFS, this will always be a problem for hardware vendors like QNAP, Synology, Netgear etc releasing these filesystem support into their NAS (network attached storage) devices.
Anyway below was some interesting conversation i picked up from the HardOCP forum, where more knowledgeable IT experts talked regarding ZFS and the alternatives BTRFS and REFS.
ZFS is now 10 years old, and STILL they find bugs in it. There are sysadmins today that think zfs is too new and unmature, ten years after v1.0 release.
When btrfs is released as v1.0 it will take ten years before it gets stable enough to be used in production. When will v1.0 come?
I believe MS Windows ReFS will come to market before btrfs, with more functionality. ReFS has checksums for metadata, but not for the data itself, alas.
As for ReFS, from what I’ve read (since I haven’t used it) it sounds like a great step in the write direction – but for raid (not mirror type setup) it seems to be missing critical features – data file integrity via checksum (it’s there, just not “turned on”) and no multi-parity drives being the two big ones (you want to run a 10 disk array with 1 parity drive?)
That doesn’t mean it’s never going to get there – MS has a tremendous amount of expertise and skill in programming and I”m sure will turn something out – if management steers them in that direction.
That doesn’t change the fact that – as it stands now – from a “i don’t want to lose or corrupt my data” standpoint it’s significantly worse than ZFS, and worse than raid-6 cards as well – if you aren’t running a mirrored or raid-10 type setup. Which means for most of the people here running home type servers it’s a non-starter. At least until the next point release.
Regarding ReFS in Windows 8. It only supports checksums on metadata, but the data itself might be corrupt:
” Metadata integrity with checksums
Integrity streams providing optional user data integrity”
“As mentioned previously, one of our design goals was to detect and correct corruption. This not only ensures data integrity, but also improves system availability and online operation. Thus, all ReFS metadata is check-summed at the level of a B+ tree page, and the checksum is stored independently from the page itself. This allows us to detect all forms of disk corruption, including lost and misdirected writes and bit rot”
Thus, you must turn on checksums on data, it is not on by default. This is very strange. ReFS developers talk lots of data integrity and safety, and still ReFS is not check summing everything. Apparently, the MS developers are aware of data corruption and bit rot, but does not turn on check summing everywhere? Why is that?
My guess is that it is difficult to make a filesystem safe. It does not suffice to just checksum everything, it might still be unsafe. CERN did a study on data corruption and concluded on page 23:
“check summing? → not necessarily enough
- end-to-end check summing (ZFS has a point)”
Thus, it not easy to build a safe storage solution. It is like, building a safe cryptosystem, it is not easy. You can not just use a crypto and assume it is safe. No, you must design everything in the filesystem with safety in mind, and that is not easy to do. I believe that is the reason that there are no checksums everywhere in ReFS. Of course, MS can put checksums everywhere, but is it safe?
At least there are research on zfs, and data corruption – and researchers conclude at zfs detects all the artificially injected errors. In a similar study, researchers concluded that ntfs, ext, xfs, jfs, etc are not safe when injecting artificial errors.
Thus, I want to see research on ReFS and data corruption before I deem it safe. Putting checksums everywhere does not make it safe. For instance, hard drives have lots of ECC checksums everywhere on the surface, and still we see unrecoverable errors on disk (according to spec sheet). Thus, checksums are not enough.
Regarding block pointer rewrite in zfs. ZFS was created by Jeff Bonwick and Matt Ahrens in 2001, both have quite Oracle. Matt Ahrens are at Delphix who are heavily involved in Illumos, the OpenSolaris kernel that is completely open. There is a lot of development for ZFS in the Illumos camp.
Illumos have added new zfs features that Oracle does not have. Illumos is also adding functionality so that zfs versions are backwards compatible and does not break compatibillity. Illumos have attracted several of the top Sun kernel developers, such as Bryan Cantrill (father of Dtrace) and other famous developers. It seems that zfs is most heavily developed in the Illumos camp, more than Oracle.
OpenIndiana is built on Illumos, and will utilize the latest zfs version. Also, Nexenta is built on illumos and use the latest zfs version. So for the die-hard zfs fans, maybe should look at nexenta and OpenIndiana (OI)?
I think it was Matt (?) that said they might announce something on block pointer rewrite in some time, Illumos team are working on it. They have the best understanding of zfs, and are capable of implementing it.
Another interesting thing was ZFS finally making its way onto MACs.
ZFS support was dropped amid development and legal problems, but Don Brady, who was heading up the file system transition team at Apple, left to pursue it independently. And now he’s releasing a piece of software, Zevo, which finally adds ZFS support to any Intel Mac running 10.6.6 or later.
As someone who takes digital storage of my files e.g. pictures, videos, music etc seriously, ZFS end-to-end checksumming to prevent bit rot to ensure data integrity is important to me. yes i already have my data backed up in 2 locations, but that doesn’t solve the bit rot issue as both locations where the data is stored is still susceptible to it.
For my anime collection i have crc tags appended to the end, so i would then run a crc check to know whether the file is corrupt or not. With ZFS i would not have to worry about this. The only time i need check crc then is straight after downloading unlicensed anime.
This is why I’m a fan of ZFS and hope for an alternative like BTRFS to make progress to becoming a mainstream filesystem which big brand names like QNAP can finally add support for this much-needed technology to their NAS, so i and others may benefit from the features/flexibility it provides for file systems concerning digital storage.