I’m using linux mint 21.3, and a process (brave aka chrome) sometimes memory leaking, so eats all the RAM, and then linux goes into swap death loop, when everything freezes (sometimes the mouse cursor is moving), and nothing can’t be done, i can just see the HDD led blinking, and do a reset. Is there a way to make the system automatically detect swap death loop, and close the biggest ram user process, and so on?
Just turn off swap? You don’t really need it, and the kernel wiil just oom kill without it.
This doesn’t work to avoid thrashing. The kernel may invoke the OOM killer slightly quicker if you have no swap, so I guess that can sort of help, but it doesn’t properly solve the problem.
On Linux, there’s a thing called the page cache (aka disk cache): Every time (part of) a file gets read to or written from, that (part of) the file gets copied to RAM. The file is then kept there unless that RAM is needed for something more important. It is cached in RAM. But since it is also on disk, the kernel can drop the file from RAM anytime it wants.
If you’re low on RAM, the kernel therefore evicts all of the disk cache, because it can, because those pages can be reloaded from disk if needed. This means it will drop all the programs you’re running, the binary code. So any program you’re running is constantly interrupted, because its code is not in RAM.
So it runs a couple of instructions, but oh no! Call to function foo() from glibc, but guess what? That’s on disk. Queue wait for the kernel to load that. Oh now it wants function bar() from zlib, shit! Need to load that. Since loading stuff from disk is about as slow as running like a gazillion instructions, all your programs are like 1000x slower now.
This happens even with zero swap.
The correct advice is the one from @[email protected]: install/enable systemd-oomd or earlyoom.
Well that’s technically correct, but if you’re so dependent on disk cache for system performance that you can’t live without it then you really need to look at doing an upgrade.
When a box swap deaths, it usually struggles to actually fill swap enough to have the kernel still OOM kill it at any point. Generally the massive performance impact of swapping just slows the app down to the point of being useless, along with the entire rest of the box. Disk cache should not be a concern during these abnormal events.
I’ll make an appeal to authority (kernel developer working on memory management):
And then he goes on to say what I said, that it can make the OOM killer quicker to react.
https://chrisdown.name/2018/01/02/in-defence-of-swap.html
Interesting, thanks for the link!
I wouldn’t recommend disabling swap completely and you do need it
https://chrisdown.name/2018/01/02/in-defence-of-swap.html
Unfortunately, there is no guarantee that the leaking process will be the next process to try to allocate memory after you run out. It might actually be your window manager, for example.
The OOM killer is a last-ditch attempt by the OS to keep running, but it is very likely to leave your system in an unstable state.