IMO, another good reason to not use Google!
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
https://addons.mozilla.org/en-US/firefox/addon/g-search-filter/
Install this and exclude it from all search results.
This one works better: https://addons.mozilla.org/en-US/firefox/addon/hohser/ - more supported sites, and it doesn't break as often.
Why not change your search engine and set up a SearX instance? You can find all instances here: https://searx.space. For example, I have set it up like this: https://search.inetol.net/search?q=%s&category_general=1&language=en&time_range=&safesearch=0&theme=simple, and it works wonders. Results are still mostly from Google, or you can configure it to be whatever you want.
Couldn't a search engine just aggregate the result from Google, filter the Reddit responses, and then add those results to their own organic results?
Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.
I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.
I saw Reddit results in a search last night using DDG. It just said something like "It's here on Reddit, but we're not allowed to show you." I wasn't planning on using Reddit (never again), but that just irritated me.
Reddit responded: "Only google pays us". The content is not yours. You built this of naive user base that just wanted to share now these fuckers are taking it as their entitlement. As early an reddit user - fuck that place, I'm still angry.
Legally speaking, the content is theirs.
No, I don't think so. Just because you put a clause in ToS doesn't make it legally binding and most precedent is in favor of the original copyright owner.
I'd love to see the precedent, if you don't mind.
Nonsense.
If someone posts a copyright violation on YouTube, YouTube can go free under the safe harbor provisions of the DMCA. (In the US.) YouTube just points a finger at the user and says "it's their fault", because the user owns (or claims to own) the content. YouTube is just hosting it.
I don't know of any reason to think it's not the same for written works. User posts them, Reddit hosts them, user still owns them. Like YouTube, the user gives the host a lot of license for that content, so that they can technically copy and transmit it. But ultimately the user owns it. I assume by the time Reddit made the AI deal they probably put in wording to include "selling a copy of the data" to active they want in the TOS.
Now, determining if the TOS holds up in court is of course trickier. And did they even make us click our permission away again after they added it, it just change something we already clicked? I don't recall.
Usually any hosting platform has some kind of wording to the tune of "you give us permanent and unrestricted right to use your content however we want". Copyright is still yours, but you can't use it against the platform. Applies to social networks, YouTube, Flickr, anything I can think of.
should fight in court that it's not reddit's content. it belongs to the people not steve fuck face.
Ok so they are earning on our data
You just described every company
Honestly? I'd be happy to not see their trash in any search engine I use.
I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.
I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I'm not saying it's the right or best path.)
Can you use something like the DDOS filter to prevent AI automated scrapings (Too many requests per second).
I'm not a tech person so probably don't even know what I'm talking about.
I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP.. I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered..
They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data..
I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly.. meaning if you try’n ban IP’s, you might hit real users as well.. which would be unfortunate.
We have a variety of tactics and always adding more
Oh well. Time to post more questions on lemmy
Hot take here.
I do believe in free information.
Instead of investing money in stop crawlers why do not make the data they are trying to crawl available to everyone for free so we can have a better world all together?