So, I’m selfhosting immich, the issue is we tend to take a lot of pictures of the same scene/thing to later pick the best, and well, we can have 5~10 photos which are basically duplicates but not quite.
Some duplicate finding programs put those images at 95% or more similarity.

I’m wondering if there’s any way, probably at file system level, for the same images to be compressed together.
Maybe deduplication?
Have any of you guys handled a similar situation?

  • smpl@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    I was not talking about classification. What I was talking about was a simple probe at how well a collage of similar images compares in compressed size to the images individually. The hypothesis is that a compression codec would compress images with similar colordistribution in a spritesheet better than if it encode each image individually. I don’t know, the savings might be neglible, but I’d assume that there was something to gain at least for some compression codecs. I doubt doing deduplication post compression has much to gain.

    I think you’re overthinking the classification task. These images are very similar and I think comparing the color distribution would be adequate. It would of course be interesting to compare the different methods :)

    • smpl@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      Wait… this is exactly the problem a video codec solves. Scoot and give me some sample data!

      • simplymath@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        4 months ago

        Yeah. That’s what an MP4 does, but I was just saying that first you have to figure out which images are “close enough” to encode this way.