Go Beyond

Only read if you don't mind being offended.


Self Hosting

This continues some of my thoughts in Online vs Offline Software.

When you live in an area where power outages are commonplace, you learn how to function without it. The resources that seem to never go down will eventually, one day, fail you. And that day may be catastrophic as people build up more and more trust in a giant, monolithic thing. Finally, it breaks and everything else with it.

In today's Cloud Computing model (let everyone else do the work for you), it's highly possible to have chicken and egg scenarios involving high level architecture. Where one service going down actually depended on a service which it was hosting before the outage. Most of such breaks can be fixed manually, but as the complexity of software increases, it becomes harder and harder to solve. If your stack has ten million lines of code, do even your best engineers really know what's going on?

No doubt, there's many benefits to outsourcing your metrics, your monitoring, your log processing, your network analysis, and your databases. Some of these can be simple "value add" with few sideeffects if down. Others, not so much.

Against my own point, for many it is the more reliable model. I at least have to admit that. Few teams have the man power and talent to in-house all of their architecture the way things stand today.

However, iceberg complexity and reliability aren't my only concerns with Cloud Computing. There's privacy concerns of companies having access to your data. Some companies cannot read if it's encrypted on your end, but most have the ability to. If you have any distrust over hackers, corporations, and/or the government, any one of those entities can ruin your day going after the data you uploaded. And not just privacy, there's also security risks if the services are hacked use their access and/or your data to break into your own infrastructure.

I also dislike not being in control of notable pieces. I don't have the source code, I don't know how the architecture is. Truly, the specialization of these companies has led to incredible products that are better for most people. I just think that now the companies are siloed. Huge targets and hugely complicated. What if there was a way to apply that level of specialization while making it self-hostable? Where you had the final say with your data, no one else?

Self hosting

Most certainly this isn't just my idea. There is a push for self hosting. I think mostly fueled by distrust of Google, Apple, and the like.

My speculation on modern Cloud Computing

If I already sound far-fetched, just you wait. I recently read an article discussing implications of GPL licensing (Decensor). I found it fascinating. A considerable amount of open source software uses the GPL license. The network effect of dependencies and existing code directly influences the profit model and thus the business landscape behind so many projects. Basically, GPL code cannot have proprietary versions derived from it. I wonder if this is one part of the movement away from desktop software to cloud models where there's no such restrictions (as websites and API endpoints are program output which is not covered). Since with GPL software, consulting and hosted services are the two main ways to make money aside from donations.

One benefit of proprietary software is that it's perhaps more often self-hosted. And now I am certainly skeptical of it, but I wonder if there was more proprietary competition for self-hosted software if there would be more open source competition in that domain as well.

Nonetheless, I also wonder if it's reasonably possible to make self-hosted software easier and better. Now such a push would indeed be a better match for proprietary software which restricts licensing away from the GPL. And I'm not particularly interested in proprietary code, but developers do need a profit model. Perhaps for some heavily sandboxed components, it is perfectly fine. Say, compressors or anything that work well in a pipeline or a sandbox in a GUI (video player without network access and with readonly access to disk, perhaps).

ApocalypseBSD

I don't know if I'd call it that. I am thinking up another distribution of sorts. There is a common trend I am finding with all of the distributions I've tested. They require internet access to do any serious development. Many will not even ship with source code. FreeBSD's DVD installer is probably the best which offers source and a means to recompile. However, it does not include the kyua testing framework or documentation.

How much more space would a distribution need to include source, documentation, and everything you'd actually want to hack on it? And yes, that should include tests.

So for the worst case scenario of being exiled from the internet or the internet exiled from you, with one ISO at least you could have a full system that you can evolve.

Ideally, perhaps the feature set would be kept to be somewhat simpler. Perhaps there's a sweet spot. Not so easy that the layman would quickly ramp up to speed, but that a fairly dedicated hacker actually understood the whole system, at least as much as needed. What those compromises are, I'm not certain. But it seems likely to me that software could be balanced better in terms of complexity to benefit.

So, in sum:

  • Source code, documentation, testing framework, and a reasonable toolchest all in the ISO. Enough to evolve the system even offline.
  • All patches, ports, code updates, and packages (if used) should be easily pointable to any repository. This should also be documented and possible with a stock system.

Now obviously, a stripped down image for the majority of your server deployments makes sense. But for your laptop, I don't see the issue with bundling everything you'd need to rebuild itself.

This is mostly just a bare and simplified BSD at this point. We can go further.

My biggest issue with the selfhosting movement I see today is that so many of the applications do not have considerations for a whole lifecycle, especially including redundancy. Some, maybe not even much for backup. I find it tedious that personal use notepads and calendars want MySQL databases, something fit far more for SQLite. It seems like if you track a service's consumables and outputs, backups could be fairly straight forward. And perhaps extremely simple. Maybe a common format of /service/thing/{data, interface, log}. I'm not certain. At least if it were as portable as copying the whole thing in place with some sandboxing/isolation considerations. I realize this is partly the Docker concept.

Obviously it is counter to my "one person being able to know it all" ideal, but some included notion of services and backup, upgrade, even replicate functionality could be useful. And if you wanted NASA grade reliability (or really, paranoia), some API endpoints can be ran on three systems and the majority wins or the odd one out marks the hardware as suspect. More certain than ECC and RAID 1, encompassing many other problems. Of course not everything fits that bill exactly and it's ridiculously overkill for most. But at the same time, majority rule APIs may be a simpler option if you really want to trust the output of your machines.

A further goal could be adhoc networking and routing if you wanted to piecemeal a reasonable intranet together. I guess with each critical component designed to handle sneakernet, especially for updates, ports, etc. All with validation and signatures if need be.

Anyway, this is probably getting dull. Just have lots in my head I wanted to get out. Maybe it makes sense to some of you.