Forum Discussion

Gordon_Fecyk's avatar
9 years ago

Deduplication stream size vs disk cluster size

In a few older applications that rely on file system-based databases, like old Usenet news spool servers for instance, it was a good idea to match the storage volume's cluster size to the smallest file size if possible. For that old Usenet server, 1 KB clusters were good for those less-than 1 KB posts.  This made the best use of the space, though it made the file system hellacious to manage when things went wrong.

Now fast-forward to today. I have a new 20 TB volume to put a dedupe store on, and NTFS on Server 2008 R2 has a minimum cluster size of 8 KB for a volume that size. I can make the clusters larger though. Is there any benefit to matching the dedupe store volume's cluster size to the data stream size to maximize data deduplication and minimize storage waste? For instance, choose a cluster size of 32 KB and data stream size of 32 KB?

(By the way, there's no OS selection for Server 2008 or 2008 R2. It says "2008) not applicable."

--

    • Gordon_Fecyk's avatar
      Gordon_Fecyk
      Level 3

      I ended up using a 64 KB cluster size and a 64 KB data stream size on the new store. The space wasted isn't awful yet; there's a loss of 768 MB out of 3 TB, or 0.025% waste.

      Thanks for the config links.

      --

    • Gordon_Fecyk's avatar
      Gordon_Fecyk
      Level 3

      The volume is on a FC SAN so stripe size isn't an option I can control. I can change the cluster size though, when I format the volume.

      The tech note didn't mention data stream size however. If the cluster size is going to be 64 KB, would it make sense to change the stream size to match?

      I've always wondered how large a deduplicated block is, or if that was configurable at all.