Tech Debt: The Bill Comes Due

Takeaways

  • You can’t solve tech debt, but you can mitigate it. Mitigation requires constant, low, overhead. In the long run that overhead is worth it.
  • Sometimes the cost to clearing tech debt is so high, and the value of doing it so low, that starting over is the best path forward.
  • Always try to reduce your feature set. You can cut more than you think.
  • Don’t try to solve every tech debt problem at once. Sometimes a partial solution now makes a future solution more difficult, but the alternative is always a difficult solution now.
  • Make the migration small-batch deliverable to avoid the need for heroics (a core tenet of DevOps).
  1. Prepare the new system for (probably transformed) versions of the data.
  2. Have the old system start tracking whether or not each user’s data has been copied to the new system.
  3. Change all parts of the old system that interact with the data from data processors into data tubes: first migrate the data to the new system, then act as a proxy to transform legacy requests into new ones (and new responses into legacy ones). This ensures that active users will have their data migrated on demand.
  4. Batch-migrate all data to the new system. This ensures that all inactive users will eventually have their data migrated, and can be done with lower-priority background processes.
  5. Turn off the old system.

What’s Tech Debt?

BscotchID: Our Tech Debt

  • BscotchID had no development version. (A “development” version is a separate copy of the software that doesn’t talk to the same data as the “production” version, allowing for safe development and testing without risking negative impacts on real users) There was only production.
  • I had no automated tests, nor even a checklist of manual tests. I ran custom tests in production while working on specific parts of BscotchID. Then I just hoped nothing bad happened later.
  • There was no build process. I had to upload changes were manually on a per-file basis. There was no way to know if local changes were actually propagated to production, nor if the production server and local files were synced.
  • The code violated the DRY (“Don’t Repeat Yourself”) principle constantly. If I needed to fix something, it was likely I had to fix it in many places.
  • Two different website domains, and two separate databases, collectively contained all code and data required to run the web features of Crashlands. The different code bases had to constantly reach into each other to implement things.
  • There was no use of environment variables or feature flags (these are techniques to prevent in-development features from having an effect in production). I had to remember to turn things off or on before pushing code to production.
  • Login security was… shaky at best. Fortunately the account management features were so limited that this shaky security didn’t endanger private user info. If you’re familiar with the (bad) concept of “Security through Obscurity”, this was “Accidental Security Through Missing Features”. Turns out it’s pretty effective…
  • “Security through Obscurity” was my primary security approach for features I did have.
  • Data was stored, well, stupidly.

Rumpus Is Born

  • Docker to containerize the application. (“Containerization” is a fancy term that means having your code run in, essentially, a simulated operating system whose properties you can guarantee are always the same. It allows you to guarantee that your production, development, and test environments are as identical as you can get, and also makes everything more portable.)
  • Node.JS for the server, instead of PHP + Apache/Nginx. Node.JS is just JavaScript, but on the server. Since most of my time had been spent writing either server logic or JavaScript browser logic, this would allow nearly all of my development time to be in one language instead of two.
  • MongoDB for the database, instead of MySQL. MongoDB is basically just JavaScript yet again, so now I could cover three parts of the tech stack with just one language!
  • Vue.js for the front-end. Vue.js (and similar frameworks/libraries) allow for rapid website development. And yet again, it’s just JavaScript! Okay, fine, it’s also HTML and CSS, but there was never going to be an escape from those.
  • Amazon Web Services (AWS) for server deployment. This would allow for infinite scalability, and for me to get to worry less and less about deployment details as AWS services were improved and added.

The Tech Debt Bill Comes Due

  • Just walk away. It was (mostly) working with all our old titles, so we could leave it as-is until it eventually stopped working. At that point, we’d call it “unsupported”, or simply turn it off, and move on with our lives. But this would mean abandoning half a million users, which didn’t seem wise.
  • Maintain it. No more features, but continued bug fixes to keep things working. But this would require maintaining two systems (BscotchID and Rumpus).
  • Migrate BscotchID users into Rumpus, and leave the data behind. But this would negate the entire reason that all our existing users signed up for BscotchID accounts.
  • Migrate BscotchID users and data into Rumpus, and then turn off BscotchID. If we found a way to convert BscotchID accounts into Rumpus accounts, we’d get to keep all of our users and their data, while getting to sunset BscotchID.

Migration Part 1: Linked Accounts

  • Set up secure BscotchID-to-Rumpus triggers to ensure that every newly created BscotchID account caused creation of a new Rumpus account, using email addresses as the shared identifier. This would cause any new BscotchID account to be guaranteed to be linked to a Rumpus account.
  • Ran that same code against all already-existing BscotchID accounts. This would guarantee that every old BscotchID account would be linked to a Rumpus account.
  1. Set up Rumpus to take BscotchID password/username change requests and forward them to BscotchID, and updated BscotchID to handle those requests from Rumpus.
  2. Set up Rumpus to be able to ask the BscotchID server for a linked user’s data.
  3. Added UI elements to our Rumpus-powered website to show linked BscotchID information (fetched from the BscotchID service) and to allowed users to submit username/password changes.
  4. Turned off/redirected BscotchID account change requests initiated by BscotchID.

Migration Part 2: Legacy Game Data

  • Legacy game installs must continue to work.
  • Users must be able to swap back and forth between legacy and updated game installs without losing important data.
  • All data must be fully migrated without error.
  • Determined how that data would map onto the completely different storage system of Rumpus.
  • Made any feature changes to Rumpus to accommodate the incoming data (this was minimal, since Rumpus already had a wide variety of data storage options).
  • Created a “migration table” in BscotchID to track whose data had been migrated.
  • Turned each BscotchID script that made use of that data into a tube. Instead of doing all the normal work that the script used to do, like manage/fetch data in the BscotchID database, the new script-as-tube would convert the incoming request into one that would work in Rumpus and forward it along. (Prior to doing that, though, the script would check the migration table and, if this data for the current user hadn’t been migrated, migrate it.) Then, the tube would take the reply from Rumpus and convert it back into the format that old game installs expected.
  • Created test cases for the new scripts to ensure that the input/output relationship was exactly the same as the original, and that the data ended up in Rumpus as intended.
  • Published these changes to production, causing every new request made by legacy game installs to use the new tubes instead.
  • Batch ran the migrator part of those scripts on all existing data, so that users who weren’t actively making requests would still have their data migrated.

Migration Part 3: Game Clients

Hard Decisions: Dropping Platform Support

  • The operating systems and stores have changed over time (and will continue to do so), requiring updates to Crashlands for it to remain functional. But those non-Windows stores represent a tiny amount of revenue that doesn’t justify continued (expensive) support.
  • Our game engine does not maintain strong support for those same operating systems and stores (understandably, since engine-makers respond to the same market realities as game-makers). So even if we wanted to continue supporting them, we’d be plagued by bugfix delays while trying to convince our engine maker to fix issues that impact a miniscule fraction of their customer base.
  • Rumpus itself is far more integrated with each store than BscotchID ever was, and so there is a substantial per-platform cost to make Rumpus work on each platform.

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Coster

Adam Coster

CTO and Fullstack Webdev at Butterscotch Shenanigans