this post was submitted on 16 Jun 2023
119 points (100.0% liked)
Technology
37742 readers
500 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ultimately this is a problem that's never going away until we replace URLs. The HTTP approach to find documents by URL, i.e. server/path, is fundamentally brittle. Doesn't matter how careful you are, doesn't matter how much best practice you follow, that URL is going to be dead in a few years. The problem is made worse by DNS, which in turn makes URLs expensive and expire.
There are approaches like IPFS, which uses content-based addressing (i.e. fancy file hashes), but that's note enough either, as it provide no good way to update a resource.
The best™ solution would be some kind of global blockchain thing that keeps record of what people publish, giving each document a unique id, hash, and some way to update that resource in a non-destructive way (i.e. the version history is preserved). Hosting itself would still need to be done by other parties, but a global log file that lists out all the stuff humans have published would make it much easier and reliable to mirror it.
The end result should be "Internet as globally distributed immutable data structure".
Bit frustrating that this whole problem isn't getting the attention it deserves.
No offense, but that solution sounds like a pipedream that wouldn't work on a technical level. So you wish to keep not just the item someone published, but previous versions of it, have mirrors of it and tie it up in some sort of a blockchain thing. That sounds insanely more resource heavy than just hosting the document itself on one instance somewhere. It would be much more reliable sure, but currently even companies like reddit can struggle with all of the traffic, similarly with smaller open source projects like Lemmy instances or kbin, and your solution is to increase the amount of data?
It really isn't. Most content out there is already immutable, you don't see people uploading the same Youtube video five times with minor changes or editing their images after the upload, most services don't even allow that for users, at best you can delete and upload a new video.
Furthermore, the blockchain would only contain metadata, not the actual data, so it's automatically thousands of times easier to store than the data itself.
Mirroring that content is a complete separate and optional part of the problem, the important part is having content named in such a way that I can go to a mirror and ask "do you have XYZ" and get an answer that you can trust. With URLs that's impossible, as they can show different content whenever they want.
Also this isn't exactly a new idea, that's how most software development already works these days. A Git repository stores a copy of every little change, and every download retrieves that complete history. What's missing is some infrastructure on top of that that links all the different repositories together into one namespace (GitHub kind of does that internally, but that's of no help for repositories hosted elsewhere).
Ok, so what if this blockchain has a metadata link to a video, which is hosted somewhere, and i remove that video from that host? How is that different than just a URL pointing to that video if the blockchain just holds metadata?
I don't understand what you are solving.
The issue is that URLs don't point to videos, they point to servers. What that server returns in response to an URL query is arbitrary. Might be a video today, could be a different video tomorrow, or a completely different website all together since the domain switched owners. Almost all URLs break over the course of a couple of years.
By using content-addressing (i.e. Merkel tree, SHA256, etc.) you are able to link to the video itself. It doesn't matter if the server changes owner, your link will still point to that exact video. This does not automatically allow you to download the video of course, since the original server is still gone, but it allows you to ask others if they have a copy of that video and it allows you to verify that they returned the exact video you were looking for.
The blockchain or DHT, or whatever it might be in the end, would be used to organize the content-addresses and allow you to ask others for that video automatically. Or allow them to discover that new videos have been published. It would also provide some censorship resistance/transparency, since at the moment deleted content often just silently disappears, without any hint that it ever existed. A blockchain would keep record of what was there and why it was deleted.
For a realistic example, see this thread, it's available https://beehaw.org/post/575371, but it's also available at https://feddit.de/post/854874. The Fediverse does the mirroring just fine, but the URLs give no indications that it's the same post. If Beehaw goes down tomorrow, how are you going to find post 575371? That's the kind of problem you wouldn't be having with content-addressing or other globally unique ids.