• mac@lemm.ee
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    2
    ·
    2 months ago

    is it not relatively trivial to pre-vet content before they train it? at least with aigen text it should be.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      3
      ·
      2 months ago

      It depends on what you are looking for. Identifying AI generated data is generally hard, though it can be done in specific cases. There is no mathematical difference between the 1s and 0s that encoded AI generated data and any other data. Which is why these model collapse ideas are just fantasy. There is nothing magical about any data that makes it “poisonous” to AI. The kernel of truth behind these ideas is not likely to matter in practice.

    • RvTV95XBeo@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      1
      ·
      2 months ago

      The problem is these AI companies currently exist on the business model of not paying for information, and that generally includes not wanting to pay content curators.

      Google is probably the only one in a position to potentially outsource by making everyone solve a “does this hand look normal to you” CAPTCHA

      They can try and train AI to detect AI, but that’s also difficult.

      • FMT99@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        2 months ago

        So it’s not a problem with AI. It’s just a problem for some mayfly companies that try to profit from the latest trend?

        • Honytawk@lemmy.zip
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          As always.

          The model isn’t dying, its the way these parasites want it to work that is dying.