Use Systemd-provided cgroup IO limits #125

zatricky · 2024-08-25T17:21:39Z

Maintenance tasks may starve system IO resources for other more urgent workloads. This PR enables cgroup IO resource limits when using systemd timers.

Notes:

This needs some testing especially on other distributions. I've tested this with balance operations in Fedora 40 (systemd 255.10-3.fc40).
This works for systemd but I'm not sure how best to achieve the same for cron. Wrapping systemd-run seems redundant since it technically temporarily creates a service for each run.
I am not sure if the defaults I have suggested are good. I have based these defaults on the limits I feel would be appropriate for spindles where I expect high IO demand for non-maintenance services and I don't mind if the maintenance tasks take a very long time to complete.
I am not sure if there is a good way to configure different limits for different disk classes. For example if you have a RAID1 OS filesystem on SSD and a large RAID5 backup filesystem on spindles, it would be useful to be able to have different sets of IO limits.

To view these cgroup limits in action outside of a regular service, you can wrap a command with systemd-run. For example, the following is a balance with -musage=30 on a two-disk filesystem on /dev/dm-0 and /dev/dm-1, with IOPS limits of 60 and bandwidth of 10MBps:

$ systemd-run --property="IOReadBandwidthMax=/dev/dm-0 10M" --property="IOWriteBandwidthMax=/dev/dm-0 10M" --property="IOReadIOPSMax=/dev/dm-0 60" --property="IOWriteIOPSMax=/dev/dm-0 60" --property="IOReadBandwidthMax=/dev/dm-1 10M" --property="IOWriteBandwidthMax=/dev/dm-1 10M" --property="IOReadIOPSMax=/dev/dm-1 60" --property="IOWriteIOPSMax=/dev/dm-1 60" btrfs balance start -musage=30 /
Running as unit: run-r0fa03384626b4245b07857fc38089744.service; invocation ID: 7869d843bd814cb7bc1e5db9d35bc46f
$ cat /sys/fs/cgroup/system.slice/run-r0fa03384626b4245b07857fc38089744.service/io.max
252:0 rbps=10000000 wbps=10000000 riops=60 wiops=max
252:128 rbps=10000000 wbps=10000000 riops=60 wiops=max
$ cat /sys/fs/cgroup/system.slice/run-r0fa03384626b4245b07857fc38089744.service/io.stat
253:6 rbytes=1114112 wbytes=15321464832 rios=68 wios=117555 dbytes=0 dios=0
.... many similar lines here in my system
$ journalctl -u run-r0fa03384626b4245b07857fc38089744.service
Aug 25 15:55:25 <hostname> systemd[1]: Started run-r0fa03384626b4245b07857fc38089744.service - /usr/sbin/btrfs balance start -musage=30 /.
Aug 25 15:55:52 <hostname> btrfs[86830]: Done, had to relocate 3 out of 12787 chunks
Aug 25 15:55:52 <hostname> systemd[1]: run-r0fa03384626b4245b07857fc38089744.service: Deactivated successfully.
Aug 25 15:55:52 <hostname> systemd[1]: run-r0fa03384626b4245b07857fc38089744.service: Consumed 17.403s CPU time.

Further to the above example, if I run lsblk it shows the corresponding device IDs 252:0 and 252:128 for dm-0 and dm-1 respectively.

Signed-off-by: Brendan Hide <[email protected]>

zatricky · 2024-08-26T17:53:39Z

Limits tested and working for btrfs-scrub also

zatricky · 2024-09-09T15:00:48Z

I have tested and confirmed that this also works for btrfs-defrag. I'm not sure if it applies to btrfs-trim, so maybe the insertion of the IO limits configs should specifically skip the trim service.

@kdave I'd appreciate any comments on this PR, especially regards testing and what else should be done to have this ready to merge.

kdave · 2025-08-18T18:41:40Z

I think the difficult part here is how to do the configuration. Systemd needs the raw device paths, this may be cumbersome for the user to extract them and the devices could change over time (using device add/remove). Ideally there's a helper that lists devices of a given mount point before running the command.

Next, where to store the configuration for each filesystem. The generated unit files with IO limits make more sense as the sysconfig does not seem suitable for that, other than a global switch on/off wheather to apply the limits if configured.

For the disk classes I think this needs some user interaction, it can be guessed from sysfs but still should be confirmed if it's correct as there's more than HDD/SSD/NVMe. There could be a helper tool to gather the information and create the timer config overrides.

Regarding cron I'm not sure if this is still in use, in the beginning it was meant to be temporary as systemd was not everywhere.

zatricky · 2025-09-08T15:42:29Z

I've been putting some thought into this for a while but I don't really have a good answer that I'm sure on yet. Below are my current "good enough for now" thoughts:

You are right that the ideal config is not catered for due to complexity. My initial thought to make that complexity intuitive would be to put the config into a .json, .toml, or .yaml. We could specify only the mountpoints and the wanted limits - then the refresh script could figure out all the block device information automatically. Alternatively we could specify only the "disk-type" limits that either match on rotational vs non-rotational, or maybe to match on disk id/model/serials/etc.

Perhaps the refresh timer could also specify boot time as that is also a well-known time when disk paths are changed/refreshed.

The above could work well after the systemd rules are overridden - but then, as you suggested, any disks dynamically added or removed will have the wrong limits applied until the refresh is executed. I'd consider this an acceptable caveat as long as it is documented.

Please let me know if you like this path or if you have suggestions. :-)

Feat: Add IO resource limits options to configuration

55e5e9d

Signed-off-by: Brendan Hide <[email protected]>

zatricky force-pushed the systemd-cgroup-iolimits branch from b5e4284 to 682684a Compare August 25, 2024 17:27

zatricky added 2 commits August 25, 2024 20:54

Feat: Add IO resource limits to systemd service refresh

f02a2d4

Signed-off-by: Brendan Hide <[email protected]>

Docs: New BTRFS_IO_LIMIT option

87a1c72

Signed-off-by: Brendan Hide <[email protected]>

zatricky force-pushed the systemd-cgroup-iolimits branch from 682684a to 87a1c72 Compare August 25, 2024 18:55

zatricky mentioned this pull request Aug 16, 2025

Add Balance ionice idle priority #66

Open

kdave added the enhancement label Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use Systemd-provided cgroup IO limits #125

Use Systemd-provided cgroup IO limits #125

Uh oh!

zatricky commented Aug 25, 2024 •

edited

Loading

Uh oh!

zatricky commented Aug 26, 2024 •

edited

Loading

Uh oh!

zatricky commented Sep 9, 2024

Uh oh!

kdave commented Aug 18, 2025

Uh oh!

zatricky commented Sep 8, 2025

Uh oh!

Uh oh!

Use Systemd-provided cgroup IO limits #125

Are you sure you want to change the base?

Use Systemd-provided cgroup IO limits #125

Uh oh!

Conversation

zatricky commented Aug 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zatricky commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zatricky commented Sep 9, 2024

Uh oh!

kdave commented Aug 18, 2025

Uh oh!

zatricky commented Sep 8, 2025

Uh oh!

Uh oh!

zatricky commented Aug 25, 2024 •

edited

Loading

zatricky commented Aug 26, 2024 •

edited

Loading