Showing results for 
Search instead for 
Did you mean: 
Sun's Comparision of VxFS and ZFS Scalability is Flawed
Some engineers at Sun promoting ZFS have been publishing comparisons between VxFS and ZFS that are rather unflattering to VxFS. You can read the most recent white papers they've published comparing ZFS with VxFS, ext3, and Window's NTFS as well as some blog entries comparing the performance of VxFS and ZFS.

The comparisons with VxFS appear to be objective, but in fact the performance comparisons are chosen quite selectively. In addition, the most recent white paper contains a few significant errors.

Going through the most recent white paper from beginning to end, the first thing to strike me were some significant errors in the discussion of file system scalability.  Errors include the claims that "The maximum size of a Veritas File System is 32TB", and that "Solaris ZFS uses 128-bit block addresses".

Maximum Supported File System Size

As far as VxFS is concerned, the author confuses the maximum supported file system size of VxFS with the theoretical scalability of the disk layout. While it's true that the maximum supported file system size of VxFS 4.1 on Solaris is 32 Tbyte, the author goes on to claim that this is the maximum scalability of VxFS. That's obviously incorrect since the maximum supported file system size of VxFS 5.0 on Solaris is 32 Gblock (256 Tbyte with an 8 Kbyte block size). It's also a bit strange that the author chose to cite VxFS 4.1 since VxFS 5.0 was released July 11, 2006, almost a year before the paper was finished.

The maximum supported file system size of VxFS in a given release represents the largest file system that we're confident will work in a customer environment based on our testing and the scalability of algorithms used in VxFS. It grows over time as CPUs increase in speed, memory becomes cheaper, algorithms are improved, and as customer requirements dictate. It has grown over time, and will continue to grow in the future, assuring customers the continued ability to maintain all of their data on VxFS file systems. The maximum supported file system size does not represent, in any sense, a theoretical maximum file system size.

Nevertheless, the white paper compares the maximum supported file system size for VxFS 4.1 with the theoretical scalability of the ZFS disk layout.

Although I have not contacted Sun's support organization to check, it's unlikely that the maximum supported file system size of ZFS is 2^128 blocks since it's highly improbable that Sun has actually tested a ZFS file system to anywhere near 2^128 blocks in size. If Sun hasn't tested a file system size anywhere near this large, how can they claim to support it?

Theoretical Maximum File System Size

So lets do a real comparison of "theorteical maximum file system size". The theoretical maximum file system size of VxFS, with the current version 7 disk layout, is 2^85 bytes (32 yottabytes or 32,768 zettabytes). With some knowledge of the VxFS disk layout, this is easy to calculate. In a multi-device file system, VxFS reserves 16 bits for the device number, 56 bits for the file system block number, and has a maximum file system block size of 8192 bytes (2^13 bytes). Putting these together we get a theoretical maximum file system size of 2^16 devices * 2^56 blocks/device * 2^13 bytes/block == 2^85 bytes.

(Note that going past 2^63 bytes in a file or 2^64 blocks in a file system will probably require a change to the Operating System APIs, which currently use 64 bit fields to hold these numbers).

Of course that's just with the current disk layout. VxFS has gone through 6 revisions of it's disk layout since 1989 and each time we've provided an online upgrade from the previous version(s) of the disk layout. Further, we support older versions of the disk layout for several years after we introduce a newer version, so upgrading is relatively painless for our customers.

This year has seen the first shipment of 1 Tbyte disk drives. If current trends continue (which seems unlikely), we'll see 512 Ebyte (exabyte) disk drives in about 21 years, around which time a new version of the VxFS disk layout would be required. Of course we'll probably have revised the disk layout to offer other new features before than, but this should give some feeling for the scalability of the current layout.

ZFS is Not Quite 128 Bits

Sun's paper contains another error when it's claim that the scalability of ZFS is "128 bits". While I've already discussed the mistake of comparing the maximum supported file system size of a VxFS file system to the theoretical maximum scalability of ZFS , it appears that the maximum size of a ZFS file system is a good deal less than 2^128 blocks that is claimed. Based on this description of the ZFS disk layout, a block pointer consists of a 32 bit device number, a 63 bit block offset (number), and some other information (see the description of block pointers at the start of Chapter 2 or look at the definition of blkptr_t in /usr/src/uts/common/fs/zfs/sys/spa.h in the Open Solaris source code).

Since ZFS block offsets are always in units of 512 bytes, this means the maximum size of a ZFS file system is 2^32 devices * 2^63 blocks * 2^9 bytes/block == 2^104 bytes This is not exactly the 2^128 blocks claimed in the white paper.

Now, 2^104 bytes for ZFS is still a lot more than the 2^85 bytes for VxFS, but for all practical purposes they're the same -- larger than anything required for the forsee-able future. And, frankly, I think allowing ZFS 2^104 bytes is a overly generous since that includes an assumption of 4 billion devices. I have a difficult time imagining more than 100,000 disk devices in a data center. If we limit ZFS to 131,072 devices (2^17 devices), then the maximum file system size drops to 2^87 bytes which is pretty darn close to VxFS.


Sun has made a number of comparisons between ZFS and VxFS. In the area of scalability, Sun has considerably exaggerated any differences that might exist between the two file systems. In subsequent blog entries I'll look at some of the other issues Sun raises, particularly performance.

Much like benchmark results, claims of scalability need to be examined carefully and treated with skepticism.

Looking forward to the next installments. The data that sun is putting out there looks pretty, but it's scant on details. It also doesn't seem to cover the administration, cross-platform capabilities, array support libraries and such. Plus the 'cool' of zfs kind of falls apart when you start talking about smaller file systems and HA configurations.
Thanks! The data that Sun is putting out does indeed look pretty. ;-)

I think the one place that ZFS really shines is the simplified administration, though it, like other parts of the ZFS design, seems more oriented toward desktop systems than to enterprise data centers. Most of the other things that you mention will be an issue as Sun attempts to scale ZFS up to data center environments.
thanks for this article. I'm with sun right now, but ive used to work veritas partner.
and I was really upset when this new "you-dont-need-veritas-anymore " war was started.
I found zfs very "memory-greedy" under heavy randon i/o (RDBMS )on large volumes - maybe because of some memoryleaks.
so I'm looking forward to next article..
PS I've got question. Ive heard some rumors about sfs simple admin? that beta has been released, but I cant find any relevant information regarding this
in any case, there is a price difference in favour of zfs
I'm pretty sad about what Sun has been saying about Veritas products as well. But it's the baselessness of the flawed claims they make for ZFS that really ticks me off and makes my comments a bit sharper than they might otherwise be.

Your results with an RDMS are what I'd expect. Random writes, particularly synchronous ones like those performed by Databases, are a nightmare workload for ZFS.

The high memory consumption allows ZFS to get better performance than it would otherwise, but it's still poor (less than half that of VxFS with ODM, even when ZFS is tuned according to Sun's guidelines). I wrote a blog entry about Symantec's results

Look at Figure 6 in Symantec's results running a TPC-C like benchmark. It shows ZFS reading four times as much data from disk while delivering less than half the transactions per second compared to VxFS.  I believe that without all that memory consumed for caching, the results would be even worse. The ZFS folks say they'll improve the memory consumption without impacting performance, but I think the need to recalculate checksums all the way up the bmap tree of the file for each write will make that difficult. Of course they may yet prove me wrong ...

Message Edited by charmer on 11-19-2007 05:00 PM

Message Edited by charmer on 11-20-2007 11:02 PM
As for your question about the Simple Admin utility for Storage Foundations goes ... you're right that we released a beta version that supports an administrative model for VxFS and VxVM that's similar to the one offered by ZFS.  (Those who want to experiment can download it here.)

However, we haven't released a production version yet. We need to make some changes to VxFS to get better performance when multiple file systems share the same storage pool and there may also be some work required to make it robust against system crashes while storage reconfigurations are occurring.

I think there was supposed to be something available by now. I'm not sure what the status is. I'll try to collect some information and get back to you.

Message Edited by charmer on 11-19-2007 05:02 PM
Actually, the price difference between ZFS and VxFS is not as large as you think. Symantec has been offering a free version of Storage Foundations (free as in "free beer"Smiley Wink) for several months now called Storage Foundation Basic. It's limited to a maximum of 4 VxVM volumes, 4 VxFS Filesystems, and the system must have only 1 or 2 processor sockets, but if you meet those requirements the price is right.

Storage Foundation Basic is available on Redhat Linux, SUSE Linux, IBM AIX, Sun's Solaris and even on Windows (but there's no VxFS on windows). You can find it here or go look at our product list for Storage Foundation Basic.

Storage Foundation is expensive on the larger machines (I wish it was cheaper). However, customers who need that class of hardware to support their business tend to be more interested in RAS (Reliability, Availability, and Scalability) and Performance than they are in saving some money in software. Particularly since the money saved on the software will probably need to be spent on additional memory and disk bandwidth to get the performance of ZFS up to the same level as VxFS.

If you have that kind of hardware and you don't want to spend the money on Storage Foundations, I think you'd be better off using ext3 on Linux or Logging UFS on Solaris than you would using ZFS (assuming that filesystem performance is an important consideration for you).

About the only downside I see to using the free Storage Foundation is that it's not Open Source (sorry), and it doesn't come bundled with the OS (installing additional software is a bit of a pain).

Message Edited by charmer on 11-19-2007 04:55 PM