POSTS

Finding a default language

I’ve been writing a lot of Ruby code at home lately, and the next Cyberdelia podcast (when I get around to editing and releasing the episode) is about Ruby too. Some of this is a matter of circumstances, and some of it has been a deeper thinking about how I’ve been approaching my personal software projects.

It started with old MP3s

The pandemic has left me with a bunch of free time, so I’ve been doing a bunch of home data de-duplication, going through old HDD/DVD/USB/CD/Zip/floppy backups that were done in an ad-hoc fashion over the course of a few decades. I was good about having backups (I had lots :-P), but not good about having a means of retrieving a particular file, or an efficient way of managing them. Having come across restic, I finally had a way to de-duplicate hash-identical files and sort them out in a snapshot fashion.

The problem with having media scattered across the decades is that the metadata of the files can change over time, so a simple md5sum is going to yield 2 different files because I had the gall to correct an old “Stairway to Heaven” MP3 that had “Kiss” as the artist. To deal with something like this, you need to dig into the file itself and do the fingerprinting on the data.

Unfortunately, no set of tools exist to do this for every crappy format I’ve used over the course the decades - I’ve made some questionable choices (mostly during my university years, as is typical), so I’ve got things like “CD-quality” .wma files that were transcoded from MP3s so that I could save 20MB of space here and there. Over the past decade I’ve usually reached for my language fascination du jour, but in this case I knew I didn’t want to spend several weekends trying to hack something up that I’d use once and then throw away.

I’ve learned to use a lot of languages over the years (up to now I’ve tried to learn a new one every year), and often before I even get a handle on what the problem is, I think of how I apply the current language I’m learning to the matter at hand. Not that I’m trying to shoehorn a problem into something inappropriate, but that the best way to learn a language is approaching it with non-trivial problems. You don’t learn Erlang with “Hello World,” you learn it by building a Gopher server. You don’t learn PHP with FizzBuzz, you learn it by cleaning up a horrifying boutique website made 20 years ago. You don’t learn Rust by making a text adventure, you learn it by writing a USB driver. (I’m just kidding about the Rust one - you learn it by spending hours fighting the compiler until you give up and finally read the manuals.)

But all languages have their strengths, and in this case I wanted a language where I could hack something up in a couple hours after an exhausting work day and wasn’t going to get screamed at for not formatting a source file correctly (due to aforementioned exhausting work day). I reached for Ruby, which is sufficiently malleable without turning into mashed potatoes on a quick refactor in the morning.

Ruby 3.x

Ruby is designed more for programmer code-turnaround than runtime speed, but I think Ruby’s bad reputation for speed is becoming more invalid with each major release. A lot of great work has been done (and continues to get done) making Ruby an excellent tool to reach for with small-to-medium scale projects (which I define as things under 15-20k LOC). The new Ractor support provides a way around the GIL using the Actor model, delivering concurrency in a way that won’t blow up in your face like using plain old threads. Fiber::SchedulerInterface means that async operations (by way of gems like async) are now available, too. Ruby 3.1 has a new JIT compiler with yjit, too, delivering faster Ruby performance with no need to change your code. Bringing myself up-to-date on these advances, as well as getting around to learning a few things that have been in Ruby for a while like ObjectSpace and TracePoint, I’m finding that Ruby actually does deliver on the “programmer happiness” promise: I have a high level language that lets me drop into low-level debugging and programming effectively when I eventually hit performance problems. Now that I’m working my way through the Ruby source code with Ruby Under a Microscope and Ruby Performance Optimization, I feel like a have a grasp of the language that will allow me to stay in it longer before I hit the “rewrite-it-in-Rust/C++/Go” phase.

I’m finding myself with decreasing amounts of free time to pursue oddball (or personally necessary) project ideas and while the pandemic has given a reprieve in clearing out my calendar, in a year or two I’ll be back to my busy schedule. I start a lot of personal projects in the hope they’ll make my life better in some way. The file de-dupe scripts are a tiny example, but a larger one is the app I use to keep an inventory of my personal vinyl record collection. Going through my backups I’m finding lots of half-baked/unfinished personal projects, and the common theme was that I was trying to learn a language/framework/paradigm while also building a 2-10k LOC thing that I actually wanted/needed. Time spent on a project that doesn’t ever get to delivering value (or will only under-deliver) is time wasted, just as much as learning how to do basic network IO for the 8th time.

I also have lots of projects that I finished and then later abandoned, because the toolchain broke (e.g. Python 2->3) at an inopportune time and I didn’t have time to deal with it, so replaced it with something else less catered to the problem (or lived without it). Containers, god bless ‘em, are in some sense a band-aid for the problem that exists when a lot of incompatible things are trying to all operate at once (and hey, if you never update your underlying container layers, you don’t have to worry about anything breaking when the latest OpenSSL update breaks your crap, too). Having dozens of projects written in 6-7 languages means frequent mental context-switching when you want to address whatever the new breakage/feature is.

Picking a default

Reflecting on what I’ve accomplished since I started seriously programming, I’ve expended lots of effort simply getting familiar with lots of stuff that I have little to show, maintaining a Frankenstein’s monster of a software ecosystem that’s hard to open source because of all the glue I use to keep it together. I’m thinking that the way out of the mess will be taking a “language X by default” approach, with X becoming Ruby. There’s always going to be projects where my “language X” is inappropriate, but they should be exceptions. I think Ruby (especially with all the goodness baked into the 3.x series) is in a great place to handle a lot of what I’d like to do for the next few decades, and going through my repos, it’s more than sufficient to handle a lot of the stuff I have made or plan to make in the future.

I still need to settle on a low-level language that I can be happy with. I’ve been playing with C/C++, Rust, and Zig, and someday want to take a look at Kotlin/Java again. For a lot of the things that I would need to use on a day-to-day basis, though, a simple Ruby implementation should be more than adequate. Whatever I end up choosing as the low-level language will need to play nice with Ruby, which might mean looking at things like JRuby.

My appetite for re-writing a bunch of janky old projects is generally low, but right now I’m considering which languages and projects I’ll drop from my personal stack first. And to be clear, I’m not only thinking about projects that I’ve written - I’m also considering any mediocre “off-the-shelf” software I’ve downloaded to solve a problem where I didn’t get around to finishing it or keeping it running.

Advice

Does this change my advice to people just starting programming? Only slightly.

I think it’s still worth the effort to learn a whole bunch of languages when you’re starting out, otherwise you’ll find yourself pigeonholed into addressing every problem in the same way, and that leads to some pretty horrible stuff. I’ve seen folks who’ve only bothered to learn Perl, and will write every problem (in whatever work-mandated language) as Perl, with little care to the idioms of the language they’re actually using. Same goes for Haskell, same goes for C++, and so on. Driving a nail with a wrench is possible, but you’d be better off with a hammer. It’s important to continue learning new ways of doing things, because you can fool yourself thinking that you know everything that there is to know, forgetting that every day the field advances (sometimes in the weirdest places, like Hacker News or Quake 3).

Then there’s the more non-technical matters in picking a default language that only become apparent when you’ve learned a bunch of them:

  • How well do you get along with the upstream maintainers? If every interaction is abrasive and horrible, you won’t have the motivation to send a patch at 3 a.m. that fixes some obscure bug.
  • How actively and broadly supported is the language? Are there regular (non-breaking) improvements to the language? If 2-3 core developers dropped out, would that language be effectively “done?” Do one or two companies get most of the say in the language design?
  • Is the library system a dysfunctional graveyard? How much of it will “just work” assuming it isn’t reliant on some long-dead web API? Have all the decent names been taken over by squatters who aren’t shipping a working library?
  • Are people new to programming bothering to learn it? Why or why not? These folks are the ones writing the libraries your Twitter bot will use in 5 years, so if there’s no fresh blood that’s a reason for concern.

It’s unwise to just pick whatever language I say (or someone else says) is great. However, once you’ve found one or two languages that resonate with you after evaluating ten, it’s worth standardizing, incorporating all you’ve learned along the way.