If it were possible to run LLMs without a significant investment to GPU prowess, this problem wouldn’t be very relevant. However, the bigger FOSS LLMs require a lot of power to run.
Is there any automated technique (scripts, lookups etc) that can warn a user before the content is posted online? I’m asking this specifically for textual content.
Thanks
I didn’t mention what I wanted clearly enough, so here goes:
I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I’ll deal with the browser-agent, Cookies, IP etc.
Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities
Edit: it seems there are packages available on python and R to parser through text and try to infer identity from stylometric data. I’ll have to look into that, but it seems doable at a basic level.
What sort of opsec mistakes do you have in mind? Something having to do with the content of the post like PII, credentials, credit card numbers, etc? Stylometry data points? Something about how they/you are posting like whether their user agent indicates they’re using an outdated browser?
Also, whose posts are you hoping to scan? Your own? Are you a Lemmy instance runner who wants to warn your users or something?
What’s your threat model? Who are you trying to guard against and what are you trying to keep them from getting from these posts?
Thank you, I should have mentioned my threat model and needs more clearly.
I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I’ll deal with the browser-agent, Cookies, IP etc.
Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities
I believe Umbrella app kinda has something you want
deleted by creator
you want to do scans before the content is posted? or you want to scan existing content online that you posted?
you could self-host LanguageTool for paraphrasing capability, which would vastly reduce stylometry correlations
Thank you, I’ll take a look! That’s a great idea!