Openzfs deduplication

12/21/2023

I also quite like "temporal deduplication" because it sounds extra fancy. Additional contextįirstly, there's almost certainly a better name for what I'm describing, but "lightweight" deduplication is the best I've come up with so far, since the idea is "deduplication of the most recently read files", i.e- deduplication of short-term copying/editing. The main limitation of this "lightweight" deduplication is that data that has left the smaller deduplication table cannot be deduplicated, i.e- files that are opened but not written out until some time later, or files that are imported from an external source, however if these are minority cases for a pool there will still be a benefit from enabling the feature. However, for many use cases the primary source of duplication is going to be copying within or between datasets, as well as typical read/write activity on files where files are written out with only some parts changed.īy focusing dedup on the internal copying use cases, it should be possible to allow dedup to be used with a much smaller memory impact, and for use cases where internal copying is the main source of duplication this ought to achieve much the same benefits as full deduplication namely fast (metadata only) copying/moving of files within the same or related (same encryption root, similar settings) datasets, and reduced impact on capacity.Īs with full deduplication this should also benefit partial copies, i.e- where only some of a file's records are copied while others are discarded, replaced or added. In most cases full deduplication is an overkill feature, requiring a large amount of RAM to enable. The basic idea is that instead of attempting to build a full dedup table for one or more entire datasets (which can require a lot of RAM), the "lightweight" table would only track blocks that have been recently read, with some options for tuning the amount of dedup data held in memory using this method.ĭepending upon how well this "lightweight" dedup table is tuned, it should be possible for ZFS to eliminate large amounts of duplication as a result of copying within the pool, as any data that is written out shortly after reading (i.e- copied) should be detectable using this more limited dedup table. Inspired by discussion on issue #13392, I would like to see a "lightweight" deduplication optimised for copying operations.

Describe the feature would like to see added to OpenZFS

0 Comments

Openzfs deduplication

Leave a Reply.

Author

Archives

Categories