Go Beyond

Only read if you don't mind being offended.
/ About / Business /

onekick

Back in 2014 I moved to San Francisco and joined ThousandEyes looking for a more challenging role, along with a few other reasons. One of the problems we had was a very inconsistent and slow install process. The, insert a USB stick, try to remember the RAID config you want, and hope it's roughly consistent. With systems, homogenity is hugely helpful. If you have a sharding MySQL setup and expect 2TiB of disk but made a typo two years ago and only ended up with 1.7TiB, you might be in trouble.

We were using Puppet so a lot of stuff post-install was managed and I added things like BIOS settings via Puppet, being very explicit about packages, having the Xen hosts in Puppet, etc. It's so critical to know you can push out a change to all of your infrastructure. One of the problems was how one of the Xen-related services (libvirt?) would suspend VMs at shutdown time. So if we were expecting to take a host offline for a few minutes to move it from one rack to the other or upgrade the kernel, it might take 15 minutes to go offline. And imagine say you think it's just stuck, and pull the power on it. Xen may not be able to determine the memory content file is incomplete and corrupted. So you boot it back up, it resumes VMs, and you get at least one in some haywire state with most of its memory missing. It's the sort of situation that can cause massive corruption, obscure errors, and the whole thing was needlessly complex for what we wanted.

Anyway, Puppet handled those aspects well but we still had 8 SSDs more often than not, in some software raid 10. Anyone who's been through a Debian install knows there's a lot of options and one isn't just before the other. So say you start the install and it'd take 15 minutes if you sat there staring at the screen. So you walk away for too long, come back, and forth, and it's easily 40 minutes.

I had worked at Rackspace prior and had experience with iPXE, so I knew how I wanted to fix the problem. There were a number of other iPXE install setups, but most were wrapped tightly around full configuration management stacks. Or they had some kind of user interface to write the preseeds / boot lines in, but couldn't just have those managed in git and checked out like everything else.

I talked my boss into letting me write a custom program for it. It was very, very simple. You have profiles of preseeds and iPXE boot scripts to choose from. You boot it off the network and the DHCP server hands out this image to boot off of. From a browser you see the MAC address pop up, select the profile and the hostname you want, and it'd install it all. If you were at the console of the server, that same sort of profile / hostname menu would come up in IPXE and you could type in what you wanted right there.

My first prototype was in Golang, which apparently was not acceptable as everything else was in Python. So, redid it in web.py. It was one of my earlier Python projects and I admit that it was pretty ugly code. After leaving ThousandEyes, I was promised it'd be open sourced and I was eagerly awaiting its release. Almost three years later, it's finally here! And... it's written in Go which I find hilarious. Templating was added but the general concept is basically identical, including the URL format used for polling. It's way better than what I wrote, so nothing was lost.

https://github.com/thousandeyes/shoelaces

It's called Shoelaces. What I wrote was called onekick.

Anyway, very happy to share the story and link to the code. It worked out really well for me there and apparently the concept has worked for them since. I did have a nice repository of Debian preseeds ready for open sourcing that didn't entirely make it out the door. If you've ever done raid configuration in d-i, you know how much of a bear it is and I got some interesting configurations down pretty well.

Thanks for reading.




Share on Voat.