I use Zip Bombs to Protect my Server

some_guy@lemmy.sdf.org · 2 months ago

I use Zip Bombs to Protect my Server

Aatube@kbin.melroy.org · 2 months ago

macOS compresses its memory. Does this mean we’ll see bots running on macOS now?

DreamButt@lemmy.world · 2 months ago

No, but that’s an interesting question. Ultimately it probably comes down to hardware specs. Or depending on the particular bot and it’s env the spec of the container it’s running in

Even with macos’s style of compressing inactive memory pages you’ll still have a hard cap that can be reached with the same technique (just with a larger uncompressed file)

4am@lemm.ee · 2 months ago

How long would it take to be considered an inactive memory page? Does OOM conditions immediately trigger compression, or would the process die first?

DreamButt@lemmy.world · 2 months ago

So I’m not an expert but my understanding is the flow is roughly:

Available memory gets low
Compress based on LRU rules
Use swap
OOM

So it’s more meant to be preventative afaik

UnbrokenTaco@lemm.ee · 2 months ago

Is it immune to zip bombs?

Aatube@kbin.melroy.org · 2 months ago

All I know is it compresses memory. The mechanism mentioned here for ZIP bombs to crash bots is to fill up memory fast with repeating zeroes.

Guidy@lemmy.world · 2 months ago

I thought it was to fill all available storage. Maybe it’s both?

ivn@jlai.lu · edit-2 2 months ago

Linux and Windows compress it too, for 10 years or more. And that’s not how you avoid zip bombs, just limit how much you uncompress and abort if it’s over that limit.

timetraveller@lemmy.world · 2 months ago

I was going to say the same thing.

melroy@kbin.melroy.org · 2 months ago

let me try…

melroy@kbin.melroy.org · 2 months ago

Looks fine to me. Only 1 CPU core I think was 100%.

10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 28,0695 s, 383 MB/s

melroy@kbin.melroy.org · 2 months ago

ow… now the idea is to unzip it right?

nice idea:

if (ipIsBlackListed() || isMalicious()) {
    header("Content-Encoding: deflate, gzip");
    header("Content-Length: "+ filesize(ZIP_BOMB_FILE_10G)); // 10 MB
    readfile(ZIP_BOMB_FILE_10G);
    exit;
}

mbirth@lemmy.ml · 2 months ago

Might need some

if (ob_get_level()) ob_end_clean();

before the readfile. 😉

billwashere@lemmy.world · 2 months ago

I want to know he they built that visualization

mbirth@lemmy.ml · 2 months ago

And if you want some customisation, e.g. some repeating string over and over, you can use something like this:

yes "b0M" | tr -d '\n' | head -c 10G | gzip -c > 10GB.gz

yes repeats the given string (followed by a line feed) indefinitely - originally meant to type “yes” + ENTER into prompts. tr then removes the line breaks again and head makes sure to only take 10GB and not have it run indefinitely.

If you want to be really fancy, you can even add some HTML header and footer to some files like header and footer and then run it like this:

yes "b0M" | tr -d '\n' | head -c 10G | cat header - footer | gzip -c > 10GB.gz

cy_narrator@discuss.tchncs.de · 2 months ago

First off, be very careful with bs=1G as it may overload the RAM. You will want to set count accordingly

sugar_in_your_tea@sh.itjust.works · 2 months ago

Yup, use something sensible like 10M or so.

cy_narrator@discuss.tchncs.de · edit-2 2 months ago

I would normally go much lower,

bs=4K count=262144 which creates 1G with 4K block size

Treczoks@lemmy.world · 2 months ago

Have you ever heard of sparse files, and how Linux and Windows deal with zips of it? You’ll love this.

palordrolap@fedia.io · 2 months ago

The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.

That’s not a typo. bzip2 is way better with highly redundant data.

some_guy@lemmy.sdf.org · 2 months ago

TIL why I’m gonna start learning more about bzip2. Thanks!

sugar_in_your_tea@sh.itjust.works · 2 months ago

Brotli gets it to 8.3K, and is supported in most browsers, so there’s a chance scrapers also support it.

Aceticon@lemmy.dbzer0.com · 2 months ago

Gzip encoding has been part of the HTTP protocol for a long time and every server-side HTTP library out there supports it, and phishing/scrapper bots will be done with server-side libraries, not using browser engines.

Further, judging by the guy’s example in his article he’s not using gzip with maximum compression when generating the zip bomb files: he needs to add -9 to the gzip command line to get the best compression (but it will be slower). (I tested this and it made no difference at all).

sugar_in_your_tea@sh.itjust.works · edit-2 2 months ago

You can make multiple files with different encodings and select based on the Accept-Encoding header.

Aceticon@lemmy.dbzer0.com · 2 months ago

Yeah, good point.

I forgot about that.

Xanza@lemm.ee · edit-2 2 months ago

zstd is a significantly better option than anything else available unless you need something specific for a specific reason: https://github.com/facebook/zstd?tab=readme-ov-file#benchmarks

LZ4 is likely better than zstd, but it doesn’t have wide usability yet.

palordrolap@fedia.io · 2 months ago

You might be thinking of lzip rather than lz4. Both compress, but the former is meant for high compression whereas the latter is meant for speed. Neither are particularly good at dealing with highly redundant data though, if my testing is anything to go by.

Either way, none of those are installed as standard in my distro. xz (which is lzma based) is installed as standard but, like lzip, is slow, and zstd is still pretty new to some distros, so the recipient could conceivably not have that installed either.

bzip2 is ancient and almost always available at this point, which is why I figured it would be the best option to stand in for gzip.

As it turns out, the question was one of data streams not files, and as at least one other person pointed out, brotli is often available for streams where bzip2 isn’t. That’s also not installed by default as a command line tool, but it may well be that the recipient, while attempting to emulate a browser, might have actually installed it.

Xanza@lemm.ee · 2 months ago

No. https://github.com/lz4/lz4

LZ4 already has a caddy layer which interprets and compresses data streams for caddy: https://github.com/mholt/caddy-l4

It’s also very impressive.

just_another_person@lemmy.world · edit-2 2 months ago

I believe he’s returning a gzip HTTP response stream, not just a file payload that the requester then downloads and decompresses.

Bzip isn’t used in HTTP compression.

sugar_in_your_tea@sh.itjust.works · edit-2 2 months ago

Brotli is an option, and it’s comparable to Bzip. Brotli works in most browsers, so hopefully these bots would support it.

I just tested it, and a 10G file full of zeroes is only 8.3K compressed. That’s pretty good, though a little bigger than BZip.

bss03@infosec.pub · 2 months ago

For scrapers that not just implementing HTTP, but are trying to extract zip files, you can possibly drive them insane with zip quines: https://github.com/ruvmello/zip-quine-generator or otherwise compressed files that contain themselves at some level of nesting, possibly with other data so that they recursively expand to an unbounded (“infinite”) size.

lemmylommy@lemmy.world · 2 months ago

Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device.

LOL. Destroy your device, kill the cat, what else?

archonet@lemy.lol · 2 months ago

destroy your device by… having to reboot it. the horror! The pain! The financial loss of downtime!

Albbi@lemmy.ca · 2 months ago

It’ll email your grandmother all if your porn!

Bobby Turkalino@lemmy.yachts · 2 months ago

Haven’t thought about that Weird Al song in a while

CrazyLikeGollum@lemmy.world · 2 months ago

Ah yes, the infamous “stinky cheese” email virus. Who knew zip bombs could be so destructive. It erased all of the easter eggs off of my DVDs.

Exec@pawb.social · 2 months ago

outstanding reference

Dizzy Devil Ducky@lemm.ee · 2 months ago

The horrors of having your TV record Gigli!

arc@lemm.ee · 2 months ago

Probably only works for dumb bots and I’m guessing the big ones are resilient to this sort of thing.

Judging from recent stories the big threat is bots scraping for AIs and I wonder if there is a way to poison content so any AI ingesting it becomes dumber. e.g. text which is nonsensical or filled with counter information, trap phrases that reveal any AIs that ingested it, garbage pictures that purport to show something they don’t etc.

mostlikelyaperson@lemmy.world · 2 months ago

There have been some attempts in that regard, I don’t remember the names of the projects, but there were one or two that’d basically generate a crapton of nonsense to do just that. No idea how well that works.

lagoon8622@sh.itjust.works · 2 months ago

https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

https://xeiaso.net/blog/2025/anubis/

https://github.com/jrwren/nepenthes

frezik@midwest.social · 2 months ago

When it comes to attacks on the Internet, doing simple things to get rid of the stupid bots means kicking 90% of attacks out. No, it won’t work against a determined foe, but it does something useful.

Same goes for setting SSH to a random port. Logs are so much cleaner after doing that.

airgapped@piefed.social · 2 months ago

Setting a random SSH port and limiting it to 3/min saw failed login attempts fall by 99% and jailed IPs fall to 0.

WFloyd@lemmy.world · 2 months ago

I’ve found great success using a hardened ssh config with a limited set of supported Cyphers/MACs/KexAlgorithms. Nothing ever gets far enough to even trigger fail2ban. Then of course it’s key only login from there.

Schlemmy@lemmy.ml · 2 months ago

https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/

delusion@lemmy.myserv.one · 2 months ago

https://zadzmo.org/code/nepenthes/

Echo Dot@feddit.uk · edit-2 2 months ago

I don’t know as to poisoning AI, but one thing that I used to do was to redirect any suspicious bots or ones that were hitting their server too much to a simple html page with no JS or CSS or forward links. Then they used to go away.

fmstrat@lemmy.nowsci.com · 2 months ago

I’ve been thinking about making an nginx plugin that randomizes words on a page to poison AI scrapers.

owsei@programming.dev · 2 months ago

There are “AI mazes” that do that.

I remember reading and article about this but haven’t found it yet

corsicanguppy@lemmy.ca · edit-2 2 months ago

The one below, named Anubis, is the one I heard about. Come back to the thread and check the link.

some_guy@lemmy.sdf.org · 2 months ago

If you have the time, I think it’s a great idea.

delusion@lemmy.myserv.one · 2 months ago

https://zadzmo.org/code/nepenthes/

fmstrat@lemmy.nowsci.com · 2 months ago

That is a very interesting git repo. Is this just a web view into the actual git folder?

dwt@feddit.org · 2 months ago

Sadly about the only thing that reliably helps against malicious crawlers is Anubis

https://anubis.techaro.lol/

LainTrain@lemmy.dbzer0.com · 2 months ago

Neat

alehel@lemmy.zip · 2 months ago

That URL is telling me “Invalid response”. Am I a bot?

doorknob88@lemmy.world · 2 months ago

I’m sorry you had to find out this way.

L_Acacia@lemmy.ml · 2 months ago

https://anubis.techaro.lol/docs/user/known-broken-extensions

If you have JShelter installed, it breaks the proof of work from anubis

xavier666@lemm.ee · 2 months ago

Now you know why your mom spent so much time with the Amiga

sugar_in_your_tea@sh.itjust.works · 2 months ago

Probably.

MonkderVierte@lemmy.ml · 2 months ago

You’re using a VPN, right?

Squizzy@lemmy.world · 2 months ago

Im not and it gave an invalid response. I am just chilling on my home wifi.

alehel@lemmy.zip · 2 months ago

Nope. Just using Vivaldi on my Android device.

spicehoarder@lemm.ee · 2 months ago

I don’t really like this approach, not just because I was flagged as a bot, but because I don’t really like captchas. I swear I’m not a bot guys!

dwt@feddit.org · 2 months ago

That’s the reason I say ‚sadly‘. It’s definitely not good. But since everything else fails, this is what currently remains.

frozenpopsicle@lemmy.dbzer0.com · 2 months ago

❤️

UnbrokenTaco@lemm.ee · 2 months ago

Interesting. I wonder how long it takes until most bots adapt to this type of “reverse DoS”.

sugar_in_your_tea@sh.itjust.works · 2 months ago

Then we’ll just be more clever as well. It’s an arms race after all.

moopet@sh.itjust.works · 2 months ago

I’d be amazed if this works, since these sorts of tricks have been around since dinosaurs ruled the Earth, and most bots will use pretty modern zip libraries which will just return “nope” or throw an exception, which will be treated exactly the same way any corrupt file is - for example a site saying it’s serving a zip file but the contents are a generic 404 html file, which is not uncommon.

Also, be careful because you could destroy your own device? What the hell? No. Unless you’re using dd backwards and as root, you can’t do anything bad, and even then it’s the drive contents you overwrite, not the device you “destroy”.

Lucien [he/him]@mander.xyz · 2 months ago

Yeah, this article came across as if written by a complete beginner. They mention having their WordPress hacked, but failed to admit it was because they didn’t upgrade the install.

namingthingsiseasy@programming.dev · 2 months ago

On the other hand, there are lots of bots scraping Wikipedia even though it’s easy to download the entire website as a single archive.

So they’re not really that smart…

_cryptagion [he/him]@lemmy.dbzer0.com · 2 months ago

deleted by creator

fmstrat@lemmy.nowsci.com · 2 months ago

This is why I use things like Docusaurus to generate static sites. Vulnerability injections are pretty hard when there’s no code to inject into.