I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.
I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I’m not saying it’s the right or best path.)
Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?
I’m not a tech person so probably don’t even know what I’m talking about.
We have a variety of tactics and always adding more
Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.
So your saying reddit’s activity analytics can’t necessarily tell the difference between human activity and bot activity?
So the actual number of people using reddit vs bots isn’t very clear. Someone should tell Reddit’s share holders that’s there’s no way to tell if the advertisements are actually being viewed by people, and there’s no way to tell how much the activity reports have been inflated by bots. I bet they wouldn’t like that very much.
Always has been. Technically the server sees no difference in what a browser does vs what a bot does: Downloading files and submitting requests.
I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP… I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered…
They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data…
I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly… meaning if you try’n ban IP’s, you might hit real users as well… which would be unfortunate.
Let’s two of them die together
One only can hope, but until people learns that you can use other browser and other search engine not likely (I am talking on Google side ofc, Reddit might be affected by this in the long run).
with threads too
Blocking other search engines will hurt Reddit, all else held equal. But not by that much. Google is seriously dominant in the search engine market.
kagis
Yeah.
https://gs.statcounter.com/search-engine-market-share
According to this, Google has 91.06% of the search engine market. So for Reddit, they’re talking about cutting themselves off from a little under 9% of people searching out there. Which…I mean, it isn’t insignificant, but it isn’t likely gonna hurt them all that badly.
It’s also worth noting that the 9% they cut off was probably the group more inclined to already be using alternatives to Reddit anyways.
I would actually think that the 9% they cut off would be more likely than the 91% to be using Reddit.
You underestimate the amount of average joes that use stuff like DuckDuckGo
Seconding this. I work in IT, and the number of tech-illiterate people using DuckDuckGo as their default search engine is astounding. It’s got to be about 10% of our users (none of whom are in tech roles).
Yeah I thought the same so it’s good to see the numbers. I don’t think people realize that to support a search engine means letting them crawl your pages which means serving all your pages to them, which costs server resources. A lot of sites get more crawler load than load from actual users viewing pages. It’s a real cost.
Still, you’d think they could manage to support DuckDuckGo at least. Or a small set of search giants to give some appearance of supporting competition.
They’re also blocking posts by users who aren’t banned or even got a warning. It appears to the user as though it’s been posted, but it hasn’t.
Shadowbanning? Do you have more info on this?
I didn’t know there was a name for it, I don’t have anymore info on it, but I can show examples of it happening.
They’ve done this for a long time. It’s supposedly only supposed to be used on bots but it definitely isn’t in practice
It definitely is in practice 100%
shadowbanning is a totally different issue that’s existed for a long time though.
Hi, I’m new here. Because of the bullshit with Reddit. Greetings fellow Lemmy people.
And me, hello!
Hi!
👋👋 :)
Uppies for all of you!
And my vuvuzela?
Welcome!
Welcome to our shithole.
Thanks. :)
Federated shithole(s)
More akin to a rabbit-hole, due to that.
But who said rabbits don’t shit in their holes?Oh! And the soil is transparent.
Welcome new lemmings!
Thanks
Welcome aboard. It’s not much, but she’s got it where it counts.
In the wubba-wubba
Thank you very much. I’m liking it.
Welcome! Genuine advice for a newcomer: look around, figure out what instances you like, and shift away from lemmy.world to an instance that requires a sign-up request and which comports with your values. There is an account migration feature to make this as easy as possible.
It’s different to what people are used to, but in my experience a huge number of the worst people migrating from reddit went straight to one of the open instances. A lot of them were banned over there for quite legitimate reasons.
They know that they can’t operate their own asshole instances for long because they’ll get defederated, and they don’t want to deal with being known to an admin who has actual principles, so open sign up is their thing, and those instances are filling up with them.
Honestly I would like to see a feature that flags if a user’s instance has open sign up.
It’s getting to the point that if someone is still on an open instance, they’re a little sus to me. It’s easier to trust people who come from instances whose policies I agree with.
I mean I joined lemmy.world in the migration from Reddit and haven’t really seen any problems with being here. I tried joining one of the ones that needed a sign up request when I first switched to Lemmy but I didn’t want to have to deal with waiting to use Lemmy. I haven’t really noticed any problems being on lemmy.world and personally I don’t even look at what instances people are from. I just treat it like reddit, we’re all using Lemmy at the end of the day.
Well, maybe you don’t get into the kinds of discussions I do, or our values are different. It seems like particularly when I say anything advocating for minorities it attracts a slew of reactionaries who are persistent and impossible to reason with, and two of the places I’ve noticed they tend to come from are lemmy.world and sh.itjust.works, both largeish instances with open sign up. I haven’t noticed any particularly reactionary instances apart from the tankie ones.
This is probably the reason a number of instances have defedded from .world, so you are probably getting more of those kinds of people, and less of the people who would object to that kind of hatred, whether you’ve noticed it or not.
And I’ve noticed this problem seems worse here than it was on reddit, but I’ve realised it makes sense because the more vocal people are the ones more likely to leave or get booted from reddit, so of course we get them here, and of course the ones who have covert ideologies tend to go for open sign up.
Personally I prefer to be on an instance that I know is roughly aligned with my values so I know I won’t have to make the case to my admins that hate is bad and should be moderated out.
Maybe what I said should’ve been more neutrally stated, but it is just my opinion.
I mean I notice people like that but I never really pay attention to what instance they’re from. They’re usually a minority in any post I see anyways though and are being downvoted a bunch. Most of the time I see lots of fairly progressive people or at worst people who were supporting Biden unconditionally and trying to call anyone who pointed out any problems with him bots. Maybe it’s cause I’m still using Lemmy mostly like I used reddit and treating it as one unified platform instead of a bunch of smaller connected ones. But personally I’d prefer being on an instance that allows me to connect to as many other instances as possible cause personally I don’t want some admin team telling me who I can and can’t interact with, I’d much rather just pick communities I like no matter what instance they’re on that I like and trust the communities more with the moderation. I’d rather handle blocking instances and communities myself rather then leaving it in the hands of admins that could power trip. Again maybe that’s just my mindset from Reddit, personally I’ve enjoyed Lemmy so far and haven’t really noticed any problems on world.
Right well the only issue with world in that case is that other instances defed from it, so you do have admins telling you what you’re not allowed to see, but you’re unaware of it because you’re cut off from them.
Like I said, I tend to attract that sort of person just by saying things they don’t like and I’ve noticed a pattern. Some other instances have noticed that pattern too which is why they defedded.
Honestly I would like to see a feature that flags if a user’s instance has open sign up.
It’s getting to the point that if someone is still on an open instance, they’re a little sus to me. It’s easier to trust people who come from instances whose policies I agree with.
You know people can just lie though, right? It’s not like that’s the one magical thing that would “fix Lemmy” or something lol.
People won’t usually go to that effort just to troll when there are open instances available, and anyone with closed sign up will be quicker to ban someone who turns out to have lied about the kind of person they are, rather than these giant open instances that don’t seem to give a shit.
And yes, I know it won’t ‘“fix Lemmy” or something lol’, I never said it would. I said it was a feature I would like to see.
Bro… What?!? I’ve only been here a day and I have no clue what any of that means lol
Lemmy isn’t one service like Reddit. It’s a piece of software where anybody can run their own lemmy instance. Lemmy.world is the most popular, but there are many others. And those choosing to run an instance can “federate” with other instances, which means as a user you can see posts and comments from the other instance even though you are logged into the one you have an account on.
So the commenter is recommending you look at posts or comments from users on other instances that have more stringent sign up policies, and migrate your account there. Since your account is new, you likely don’t need to spend the effort on migrating your account and instead can just set up an account on another instance/server.
But it’s also fine to stay on lemmy.world. Just be respectful, voice your opinions like you would in person with other humans, and you’ll be fine. And if you’re just here for the memes, that’s ok too! Enjoy them! And welcome to lemmy.
Hey, thanks for the detailed explanation! That certainly helps, but it’ll probably take me a while to fully get it. I signed up using voyager and it didn’t tell me anything like that. I’m sure it’ll make more sense as I get used to it. So can I not see all posts from other instances?
It’s a bit like email, if that helps you understand it. If you use Gmail but a friend uses Yahoo, you can still both email each other.
You can see posts from every instance that your instance has federated with. For example, I, on toast.ooo, can see posts on lemmy.world, lemmy.dbzer0.com, and sh.itjust.works because my instance federates with them.
You can’t see posts from instances that your instance has defederated with, though, nor can you see posts from instances that have defederated with yours. Think of it like cutting one of those thick undersea cables that connect the internet across continents.
There’s a lot to consider when picking an instance; lemmy.world is a good default, so that’s probably why Voyager directed you to it, but don’t be afraid to switch to another instance of you think it’ll serve you better!
Thanks! You all have been very helpful in my understanding of the Lemmy world!
Yeah, there are many instances, and many that have purposefully been defederated by lemmy.world. Often for good reason (CSAM, an abundance of spam accounts, violent or hateful rhetoric, etc). But generally lemmy.world and its federated instances are pretty great.
Ok, sounds like I’ll just stick with .world for a while until I get my “sea legs”
As an option, I’d recommend lemmy.today
It’s an instance that stays away from all the drama and just federates with everyone unless this becomes a moderation problem.
Wanna look at general stuff on lemmy.world? Sure! Check tankies on lemmy.ml or lemmygrad or hexbear? Alright, who are we to stop you? Wanna porn? lemmynsfw got you covered! And literally anything else on the Lemmyverse is open to you.
If you ever decide you want to branch out, you can try the instance / community browser at https://lemmyverse.net to check out whats out there.
I will say that although it technically doesnt matter what instance you sign up with, sometimes the descriptions aren’t very descriptive at all. Definitely give an instance a browse, to get a feel for the overall vibe, before you sign up.
You can check to see if an instance has been defederated from / by other instances, by entering said instance address at https://defed.xyz/
I think this this guy is going to end up on dbzer0 once he gets his sea legs. Of note the piracy communities over there will be some of the few things you can’t access from .world.
Ignore them. Enjoy yourself. If you’re interested in moving to a different instance later once you learn more about what that means then go for it. There are tools to help you and there’s “no karma” so there’s no reason to not. But there’s no rush to do so.
The others gave you a decent rundown.
I’m certainly not meaning to imply that you did anything wrong signing up with .world, it’s just somethjng to be aware of. This is actually the first time I’ve made this suggestion, I honestly don’t know how most people feel about this, so actually maybe it was a bit much to dump on a newcomer. If so I apologise.
One thing I forgot about was that being on .world means you do miss out on a lot of piracy related stuff if you’re into that.
Also though, you can read about a given instance and its policies and values when you visit it. That often says a lot about the kinds of people you’ll meet there.
Don’t worry about it. It’s like Linux enthousiasts talking about distros.
If you get to the point where it matters to you, you’ll look into it then. I’ve been here for more than a year and still haven’t bothered to hop servers.
Thanks for the info. I’ll stay here for a while and see how everything goes.
I don’t mind assholes as I think that’s just a part of freedom of speech. And I’d rather not get too much moderated content as I think it creates too much of a filter bubble.
I got tired of the censorship and blatant disrespect for the end user. Also justiceserved and the constant spam messages from the mods there, never been a member of that community and i just wanted them to stop harassing me. Called me a nazi and some other stuff for participating in mandela effect subreddit.lots of quacks there but really now, a nazi?
Edit:i mean it’s deeper than that but they we’re very hateful and reddit muted me for 3 days over…nothing? They even actively seeked out my username on social media and attacked me there through private messages and fake accounts and when i brought this to reddit attention they muted me .
Honestly? I’d be happy to not see their trash in any search engine I use.
https://addons.mozilla.org/en-US/firefox/addon/g-search-filter/
Install this and exclude it from all search results.
This one works better: https://addons.mozilla.org/en-US/firefox/addon/hohser/ - more supported sites, and it doesn’t break as often.
Thanks! Will give it a try.
Why not change your search engine and set up a SearX instance? You can find all instances here: https://searx.space. For example, I have set it up like this: https://search.inetol.net/search?q=%s&category_general=1&language=en&time_range=&safesearch=0&theme=simple, and it works wonders. Results are still mostly from Google, or you can configure it to be whatever you want.
Mainly because it’s easier to set up a browser extension. Does SearXNG let you hide sites and rank sites higher in the results?
Also you’d really want to use SearXNG… The original SearX is dead.
These are a SearXNG instance, I think they’re aware of those facts there. You can set it up to not search or search on specific pages, you can also have multiple profiles which is useful when searching for different things. It also combines multiple web search engines… I haven’t dived into the configuration that much because what’s most important to me is that it doesn’t show ads, but I assume it’s easier to install the extension, although you don’t set this up every day.
am gonna exclude reddit
How many times is this going to be posted? I’ve seen this several times now over the past few days.
Sorry, I haven’t seen it. If it’s been posted here before, Send me the link to the previous post, and I’ll take this one down. Even better, you can report the post, and the mods will investigate it.
Thank you!
Since you asked, here are the other four times it was posted.
- https://lemmy.world/post/17906460
- https://lemmy.world/post/17913261
- https://lemmy.world/post/17930528
- https://lemmy.world/post/17949956
There was a fifth one, but that one has since been removed.
Thanks, this looks like different reporting on the same story. That happens with major news, but I can understand why it may seem like excess if it’s not a story you’re interested in.
Sure, some of those links are different. But you have to admit, even if you are interested in this story, 5 times is a bit excessive.
this is just going to cause indexers to ignore robots.txt
“We always obey the robots.txt”
- A bunch of corporations that have no accountability and plenty of incentive to just ignore it and have all been caught training AI on off-limits data.
Rate limiting could “fix” that unfortunately.
They’re likely blocking user agents too, which I think also doesn’t have legal enforcement (as in DuckDuckGo can just use “Google” unless they said otherwise.
LinkedIn tried blocking scraping that way but as long as the scraping isn’t burdensome it’s basically legal but you can still be bound by TOS and civil claims
https://natlawreview.com/article/hiq-and-linkedin-reach-proposed-settlement-landmark-scraping-case
Ok so they are earning on our data
You just described every company
Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.
I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.
I saw Reddit results in a search last night using DDG. It just said something like “It’s here on Reddit, but we’re not allowed to show you.” I wasn’t planning on using Reddit (never again), but that just irritated me.
deleted by creator
I wish we had a government that functioned. This shot is 100% antitrust. How is it that this shit is let fly.
Antitrust would be the opposite.
Is there a downside? I’m confused.
deleted by creator
“Would you like to expand your search to include human-created content? Upgrade to Google Advanced* to unlock the power of the human web!”
I’m excited for this to start triggering anti-trust legislation
It obviously should, but it won’t, because the US is a capitalist dictatorship masquerading as a democracy. The oligarchy own the government, and the regulators.
But other search engines like Bing are also American capitalist corporations and they don’t want this I’m sure.
letthemfight.gif
“sorry bro, I can’t search that website—it’s not covered by my subscription package”
Google already signaled they want to charge for their trash AI search.
Bing it is then. I hate Microsoft with the intensity of thousand suns but bing is now my jam as long as this lasts.
I’ve started a Kagi subscription for my new search engine. Basically $6 USD per month but because it’s a user-pay model they have a really good privacy policy and don’t sell/analyze your data.
It’s currently better than Google (which I still use search in the maps for reviews)
deleted by creator
Thanks I couldn’t remember the name of this.
This website shows the SearXNG public instances. It is updated every 24 hours, except the response times which are updated every 3 hours. It requires Javascript until the issue #9 is fixed.
Try duckduckgo
Bing by any other name is still bing.
Edit: Awww some people either don’t know or don’t like that bing is what duckduckgo is. https://www.tomshardware.com/software/search-engines/microsoft-suffering-from-outage-bing-copilot-and-duckduckgo-inaccessible-for-several-hours
At best this is as intelligent as saying Google Maps is YouTube by another name because they’re both on Google servers. Even that would be smarter to say actually, because Google Maps and YouTube are owned by the same company.
When bing goes down so does duckduckgo but somehow your apples to oranges argument is somehow comparative to you.
They share hosting servers, that doesn’t make them the same service. When the power goes out do you think you and your neighbors live in the same house?
Just keep sucking down the hype. They don’t share the same hosting for the frontend but they both use the same backend. The backend is of course owned by microsoft. duckduckgo uses bings backend and somehow you have convinced yourself beyond all evidence to the contray that it isn’t bing with a different wrapper.
When you can’t pay Stardew Valley (because Steam is down) you also can’t play Eldenring. They must use the same backend and Eldenring is just Stardew Valley by another name.
You’re going to need a better source than “they go down at the same time”.
DuckDuckGo also uses Bing under the hood.
Yes, duckduckgo uses other search engines to provide its results. Your point?
I don’t care where duckduckgo gets the links from, I care how relevant the top links are and that they aren’t being crowded out by ads.
No need to be defensive, ddg uses bing which means it is part of the big five under the hood. That always will have certain ramifications in the long run.
I also use it but I am looking for decentralised alternatives in meantime not because ddg is bad but because sooner or later it will get worse.
Also why are you so aggressive anyway, it’s super weird and doesn’t fit Lemmy
I’m seldom on reddit after the exodus, but when I am, I noscript the duck out of it.
You quack.
Actually, he doesn’t, since he’s removing the duck (and shipping it off to DuckDuckGo for reuse, no doubt).
It’s still possible to search with “site:reddit.com …”
Has it been implemented yet or are they blocking non-flagged searches? Which seems odd.
You shouldn’t be getting any new results if you do that, older posts will/may remain indexed.
Aha. I was wondering about that possibility.