Go Beyond

Only read if you don't mind being offended.


IPFS: The Good, Bad, and the Ugly

IPFS is one of those cool things I really wanted to like. Peer to peer object store with everything identified by a hash. IPFS nodes will pin content and your local IPFS daemon will let you fetch said content and verify the hashes are good and the data hasn't been tampered with. Kind of like BitTorrent but everything is addressed by the hash.

I decided to roll out IPFS for hosting content on this blog. I wrote Salt states to pin the content I wanted. For bigger stuff, I host it on a couple large servers and the smaller I have it already pinned. I then have nginx proxying requests to the IPFS daemon, but only allowing the hashes I choose.

And it worked, here's an example.

One of the cool things is that if you have the IPFS browser extension, an IPFS daemon will send a header saying the object is IPFS (though the multihash at the end of the URL is a likely sign) and the extension will instead fetch that object over your local IPFS daemon, making sure it's exactly what matches the hash. And the way it works, you can add any object locally and if it's the same object, the multihash will be the same. So two different universes with the exact same cat picture will get the same hash. This makes single source of truth and merging converged datasets a lot easier since there's nothing to debate.

Now, not everything is so great.

My servers hosting this blog replace themselves every week. Just launches another SporeStack server and Salts it with salt-ssh. While my objects were already pinned on a steady server, there was no hope of getting the objects doing an ipfs pin. Every once in a while it would work, but rarely. It seems that IPFS just is not good at sharing which nodes have which hashes and knowing where to connect to. It was very unreliable through a fair bit of testing. I could ipfs add on my laptop and not ipfs pin anywhere else, most of the time. Or if it did work, it took minutes for small files. This was my experience with un-NATed IPv4/IPv6 clearnet servers and my Torified servers.

So the next step, connecting to my own nodes for the hashes (which almost, frankly, defeats the purpose). The cool thing is that the node is identified by a hash, so as long as you got a good host identifying hash to begin with, it shouldn't be able to be man-in-the-middled. And if it did, the content would still have to match anyway.

I wrote Salt states to connect to my nodes. This worked, kind of. They would get disconnected all the time. It'd work, fetch for a while, then if I came back to it, even minutes later, it wouldn't be connected to that node (and I don't suspect network issues). I thought maybe those nodes were considered not important, so I switched to adding the nodes to the bootstrap. That was even worse. It might work as soon as the daemon first came up, but not later. And if I tried to add another pin, it would effectively fail most of the time. So my final solution was to add the bootstrap and connect. And keep in mind, each time Salt runs it will check if the node is connected and then connect if not. I don't know why these connections are so short lived, but they are.

I finally got it working somewhat reliably. The daemon is fairly heavy. IPFS recommends 2GiB memory at minimum and that's pushing it, but usually works. I had to bump up my 1GiB servers to 2GiB, even with a minimal server profile for IPFS. My main static nodes are not on the minimal profile.

All in all, IPFS is very cool technology but does not seem ready. And that's not even talking about IPNS, which I played with (non-DNS style) and had absolutely no luck with. I like that it's written in Go, I put in a bug report and they stepped me through fixing it, so good people behind it. Just unfortunately a bit overhyped. To add insult to injury, for the past couple of weeks IPFS' binary distribution has been absolutely slow. They dogfood and use their IPFS gateway to serve you the IPFS archives, which I use for bootstrapping my servers with. I'm not sure what's going on there but they are no longer reliable for fetching and I had to host it myself. I would see 20+ minutes downloading part of the archive and it would go no further, the IPFS gateway not sending any more data.

I've finally replaced my IPFS usage with Decensor and actually am hosting the IPFS tarball with Decensor. For Decensor I just rsync up what I want to my two "big" static servers and new servers will randomly try one of them to rsync down. It's not "ideal". It'd be cool if they could rsync/torrent from eachother. But it's fast, extremely easy, and with Decensor I'm tracking the hashes just like IPFS would, but much faster, easier, and with no data duplication (although I could have data duplication between similar files, but not of the same file). The files are also organized and tagged for me, effectively, which would have had to have been another abstraction layer on top of IPFS.

IPFS does have bug reports in for a lot of this but given the age of the project I'm not terribly impressed. If I could count on stable connections between nodes it would work a lot better and I might even sever it from the rest of the IPFS network to hopefully bring down load.

Keep in mind with all of my comments, I have not dove deeply into IPFS internals. I don't know exactly how it works. I'm not the most apt person to be writing a review of IPFS. But, I have a decent idea of using it in production, within my (unusual) environment.

I have stopped hosting more things on IPFS and may either make an IPFS static file wrapper of sorts (host files locally under /ipfs/), migrate everything to Decensor, or both. We will see.