The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

    • DreamButt@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      20 days ago

      No, but that’s an interesting question. Ultimately it probably comes down to hardware specs. Or depending on the particular bot and it’s env the spec of the container it’s running in

      Even with macos’s style of compressing inactive memory pages you’ll still have a hard cap that can be reached with the same technique (just with a larger uncompressed file)

      • 4am@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        19 days ago

        How long would it take to be considered an inactive memory page? Does OOM conditions immediately trigger compression, or would the process die first?

        • DreamButt@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          19 days ago

          So I’m not an expert but my understanding is the flow is roughly:

          1. Available memory gets low
          2. Compress based on LRU rules
          3. Use swap
          4. OOM

          So it’s more meant to be preventative afaik

      • Aatube@kbin.melroy.org
        link
        fedilink
        arrow-up
        4
        arrow-down
        2
        ·
        20 days ago

        All I know is it compresses memory. The mechanism mentioned here for ZIP bombs to crash bots is to fill up memory fast with repeating zeroes.

    • ivn@jlai.lu
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      1
      ·
      edit-2
      20 days ago

      Linux and Windows compress it too, for 10 years or more. And that’s not how you avoid zip bombs, just limit how much you uncompress and abort if it’s over that limit.

    • melroy@kbin.melroy.org
      link
      fedilink
      arrow-up
      13
      arrow-down
      1
      ·
      20 days ago

      Looks fine to me. Only 1 CPU core I think was 100%.

      10+0 records in
      10+0 records out
      10737418240 bytes (11 GB, 10 GiB) copied, 28,0695 s, 383 MB/s
      
      • melroy@kbin.melroy.org
        link
        fedilink
        arrow-up
        12
        arrow-down
        2
        ·
        20 days ago

        ow… now the idea is to unzip it right?

        nice idea:

        if (ipIsBlackListed() || isMalicious()) {
            header("Content-Encoding: deflate, gzip");
            header("Content-Length: "+ filesize(ZIP_BOMB_FILE_10G)); // 10 MB
            readfile(ZIP_BOMB_FILE_10G);
            exit;
        }
        
        • mbirth@lemmy.ml
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          1
          ·
          20 days ago

          Might need some

          if (ob_get_level()) ob_end_clean();
          

          before the readfile. 😉

  • mbirth@lemmy.ml
    link
    fedilink
    English
    arrow-up
    38
    ·
    20 days ago

    And if you want some customisation, e.g. some repeating string over and over, you can use something like this:

    yes "b0M" | tr -d '\n' | head -c 10G | gzip -c > 10GB.gz
    

    yes repeats the given string (followed by a line feed) indefinitely - originally meant to type “yes” + ENTER into prompts. tr then removes the line breaks again and head makes sure to only take 10GB and not have it run indefinitely.

    If you want to be really fancy, you can even add some HTML header and footer to some files like header and footer and then run it like this:

    yes "b0M" | tr -d '\n' | head -c 10G | cat header - footer | gzip -c > 10GB.gz
    
  • Treczoks@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    18 days ago

    Have you ever heard of sparse files, and how Linux and Windows deal with zips of it? You’ll love this.

  • palordrolap@fedia.io
    link
    fedilink
    arrow-up
    119
    arrow-down
    5
    ·
    20 days ago

    The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.

    That’s not a typo. bzip2 is way better with highly redundant data.

      • Aceticon@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        9
        ·
        19 days ago

        Gzip encoding has been part of the HTTP protocol for a long time and every server-side HTTP library out there supports it, and phishing/scrapper bots will be done with server-side libraries, not using browser engines.

        Further, judging by the guy’s example in his article he’s not using gzip with maximum compression when generating the zip bomb files: he needs to add -9 to the gzip command line to get the best compression (but it will be slower). (I tested this and it made no difference at all).

      • palordrolap@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        18 days ago

        You might be thinking of lzip rather than lz4. Both compress, but the former is meant for high compression whereas the latter is meant for speed. Neither are particularly good at dealing with highly redundant data though, if my testing is anything to go by.

        Either way, none of those are installed as standard in my distro. xz (which is lzma based) is installed as standard but, like lzip, is slow, and zstd is still pretty new to some distros, so the recipient could conceivably not have that installed either.

        bzip2 is ancient and almost always available at this point, which is why I figured it would be the best option to stand in for gzip.

        As it turns out, the question was one of data streams not files, and as at least one other person pointed out, brotli is often available for streams where bzip2 isn’t. That’s also not installed by default as a command line tool, but it may well be that the recipient, while attempting to emulate a browser, might have actually installed it.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      97
      arrow-down
      2
      ·
      edit-2
      19 days ago

      I believe he’s returning a gzip HTTP response stream, not just a file payload that the requester then downloads and decompresses.

      Bzip isn’t used in HTTP compression.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        30
        ·
        edit-2
        19 days ago

        Brotli is an option, and it’s comparable to Bzip. Brotli works in most browsers, so hopefully these bots would support it.

        I just tested it, and a 10G file full of zeroes is only 8.3K compressed. That’s pretty good, though a little bigger than BZip.

      • bss03@infosec.pub
        link
        fedilink
        English
        arrow-up
        3
        ·
        18 days ago

        For scrapers that not just implementing HTTP, but are trying to extract zip files, you can possibly drive them insane with zip quines: https://github.com/ruvmello/zip-quine-generator or otherwise compressed files that contain themselves at some level of nesting, possibly with other data so that they recursively expand to an unbounded (“infinite”) size.

  • lemmylommy@lemmy.world
    link
    fedilink
    English
    arrow-up
    72
    arrow-down
    1
    ·
    20 days ago

    Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device.

    LOL. Destroy your device, kill the cat, what else?

  • arc@lemm.ee
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    19 days ago

    Probably only works for dumb bots and I’m guessing the big ones are resilient to this sort of thing.

    Judging from recent stories the big threat is bots scraping for AIs and I wonder if there is a way to poison content so any AI ingesting it becomes dumber. e.g. text which is nonsensical or filled with counter information, trap phrases that reveal any AIs that ingested it, garbage pictures that purport to show something they don’t etc.

  • fmstrat@lemmy.nowsci.com
    link
    fedilink
    English
    arrow-up
    37
    ·
    19 days ago

    I’ve been thinking about making an nginx plugin that randomizes words on a page to poison AI scrapers.

  • UnbrokenTaco@lemm.ee
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    20 days ago

    Interesting. I wonder how long it takes until most bots adapt to this type of “reverse DoS”.

  • moopet@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    1
    ·
    19 days ago

    I’d be amazed if this works, since these sorts of tricks have been around since dinosaurs ruled the Earth, and most bots will use pretty modern zip libraries which will just return “nope” or throw an exception, which will be treated exactly the same way any corrupt file is - for example a site saying it’s serving a zip file but the contents are a generic 404 html file, which is not uncommon.

    Also, be careful because you could destroy your own device? What the hell? No. Unless you’re using dd backwards and as root, you can’t do anything bad, and even then it’s the drive contents you overwrite, not the device you “destroy”.

  • fmstrat@lemmy.nowsci.com
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    2
    ·
    19 days ago

    This is why I use things like Docusaurus to generate static sites. Vulnerability injections are pretty hard when there’s no code to inject into.