Explaining ZFS LOG and L2ARC Cache: Do You Need One and How Do They Work?
Video Statistics and Information
Channel: Lawrence Systems
Views: 78,219
Rating: undefined out of 5
Keywords: LawrenceSystems, zfs file system, l2arc, l2arc truenas, l2arc vs slog, l2arc hit ratio, l2arc vs zil, l2arc tuning, l2arc metadata, zfs l2arc, zfs slog, zfs slog vs l2arc, zfs slog ssd, zfs slog drive, zfs slog optane, zfs slog device, zfs slog nvme, zfs file system explained, ZFS Cache, ZFS Write cache, ZFS Read Cache
Id: M4DLChRXJog
Channel Id: undefined
Length: 25min 8sec (1508 seconds)
Published: Wed Mar 16 2022
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.
Thanks for sharing this.
Hi, this is good but there are a few points it might help to add:
Sync writes are written out to ZIL blocks, but any outstanding async writes within the same sync domain are also written out when that sync write comes through. So, if you have async writes to a file or zvol that have not yet been committed, and then a sync write comes through on the same file or zvol, all those async writes will be made durable. This is central to providing a consistency guarantee.
It’s also one of the leading ways that people get ZVOLs “wrong”. A zvol is a single sync domain and to accrue async data in memory, separateable sources of cache flushes (like fs journals) should go to a separate zvol, otherwise performance will suffer badly.
One of the biggest consequences of using a SLOG is that all sync writes can go via “direct sync” - they are written literally to ZIL blocks. Without a SLOG, large writes go through the “indirect sync” path, which causes RMW and compression and checksumming to happen inline with the sync write request. Inline RMW can destroy sync write performance and amplify IO. This effect is often greater than just being able to move the ZIL writes to another device.
In addition, blocks written by indirect sync consume an extra metadata block which is fragmented by the data block. Reading them later can double read IOPs.
While a TxG commit happens every 5s by default, that doesn’t mean you can just use that as a yardstick. The transaction group has to then pass through both the quiesceing and synchronization phases, which can take additional time. In addition, small ZIL writes can take double the space, since each one comes with a metadata block “header”. It’s much safer to assume you need 1/2 of the ARC size, as this is twice the max dirty data that is normally held in ram.
SLOG devices should be on their own namespace if on nvme, and should be overprovisioned.