this post was submitted on 31 Jul 2024
1 points (100.0% liked)

Technology

59651 readers
2690 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”

(page 4) 42 comments
sorted by: hot top controversial new old
[–] BurnSquirrel@lemmy.world 0 points 3 months ago (2 children)

This must be how they plan to raise revenue, since people don't buy enough gold and badges to support the site

[–] tempest@lemmy.ca 0 points 3 months ago (1 children)

That isn't how it works for publicly traded companies. There is no such thing as enough only more

[–] MeaanBeaan@lemmy.world 0 points 3 months ago (1 children)

Plus they don't even need to be profitable. They just have to convince some rich losers that they might be profitable at some undetermined time in the future.

[–] Ragnarok314159@sopuli.xyz 0 points 3 months ago

A lot of times it’s more like the old Taxi medallions. You paid 400k for the medallion, operated it for a few years, and then sold it to the next guy for the going rate which is likely around 400k.

[–] Serinus@lemmy.world 0 points 3 months ago

Well, they did. I guess the definition of "support" has changed. Probably to include more yachts.

[–] umbraroze@lemmy.world 0 points 3 months ago (5 children)

Ok, now I'm miffed that Google caved to Reddit's demands and paid up.

Because this set a dangerous precedent.

Earlier, Google got a lot of demands from various publications to pay up for indexing the publicly available news sites. And they always responded with "Ok, guess you leave us no other choice than just exclude you from indexing altogether." Let the site simmer for a while until they went "oh shit, not being indexed by major search engines sucks. we didn't really mean it please come back"

It's especially jarring because Reddit doesn't even produce their own news content anyway. That search engine money isn't going to the content creators. News sites at least could say they need to pay for their content to be written by their employees.

[–] Ragnarok314159@sopuli.xyz 0 points 3 months ago

I am guessing Google paid for access to their internal archives on posts and comments. Will give them a unique dataset for all the stuff that was deleted during the many exodus runs over the years.

[–] kameecoding@lemmy.world 0 points 3 months ago

I don't see why you should be miffed at all, Google can bully publications and unindex them and it will work. Reddit according to this: https://www.semrush.com/blog/most-visited-websites/ is the third most visited website after google and youtube, so they have a bit more power, lots of people google with "site:reddit.com" because it still has some useful content like that and I am going go out on a limb and say that US visitors are the most important for selling ads for Google.

Microsoft will have to make it's own value calculation whether it's worth it and they will likely payup, although more and more of reddit is just bots posting stupid shit.

[–] gh0stcassette@lemmy.blahaj.zone 0 points 3 months ago

Other search engines should continue indexing Reddit and take them to court if they issue a cease and desist imo.

[–] SirEDCaLot@lemmy.today 0 points 3 months ago

At this point I think Google needs Reddit more than Reddit needs Google. Google search kind of sucks these days. How often do you add site:reddit.com to the end of the query to get any sort of useful result for a specific question? For me it's pretty often. If Reddit cuts off Google, that goes away and Google search suffers significantly. And that might mean the one thing Google cannot abide- a situation where people in large numbers start actively seeking out other search engines.

Don't get me wrong, they're both being super shitty.
Google needs to quit obsessing over AI and a million different cloud products and fix the one product that people actually care about. Reddit needs to stop acting like they own everybody.

[–] AngryCommieKender@lemmy.world 0 points 3 months ago

Steve "Spaz" Huffman has been trying to milk money out of the site that Alexis Ohanian, Aaron Swartz, and pigboy Steven Spaz kinda created collaborating with each other. Aaron was shoved out first by The Spaz, though one could claim rightfully so in that case since Aaron was basically done with the site, and had moved on to his next project, essentially leaving Alexis and Spaz in the lurch as neither of them understood the code that Aaron had written to make the site functional.

In many ways, the users made this possible. Most of us aren't users in this case. The users that make up the vast majority of the population don't give one thought to their own personal privacy, after all they have "nothing to hide," not knowing that they really need to hide almost all of their data.

If the users were to be educated about how much money the various companies like Reddit, Facebook, Microsoft, Apple, and almost every single other "disruptive tech company," has stolen from them, the socialist revolution would have started in the 1980s

[–] Gestrid@lemmy.ca 0 points 3 months ago (1 children)

This is like asking a website to respect robots.txt.

[–] Darkassassin07@lemmy.ca 0 points 3 months ago

Https://reddit.com/robots.txt

They literally are, and attempting to block those that don't.

[–] Freefall@lemmy.world 0 points 3 months ago

I don't think the content on Reddit is their to sell...unless resistors are getting a cut. That site is a dumpster and needs to die already.

[–] Linkerbaan@lemmy.world 0 points 3 months ago

Does LLMmy have a robots.txt against scrapers?

load more comments
view more: ‹ prev next ›