YOW! September 2021 Recap
Ken Scrambler
Ken Scrambler talked about what was in his own admission an incredibly ambitious, scary, and perhaps even career-threatening project at MYOB - a massive architectural undertaking to rebuild their existing legacy desktop product as a cloud-based web system for their userbase. In itself, this was a huge task, but making things worse was ATO regulatory change meaning that both their legacy product and the new cloud version would be having to be updated simultaneously to keep pace with legalities.
After a period of soul-searching, Scrambler’s team made a tough call - rather than building a brand new product, they’d abandon it and instead build a new web interface on top of their existing desktop-app-facing backend services. Customers would be told they’d be getting a brand new web interface, without the business taking all the risk and cost of rebuilding all their existing logic all over again. But to make this work, there’d need to be some careful re-architecting of the product to support both desktop and web interfaces. After debating the merits of microservices, Scrambler’s team elected to work with “malleable monoliths” to accomplish this, in conjunction with the Backends For Frontends (BFFs) pattern.
What was interesting about the way the team went about this is they elected to conciously optimise for the cognitive load of the teams who worked on the system, something he’d expand upon in a tweet;
1,000,000 components but you only need to think about 5 at once = a simple system.
— kenbot (@KenScambler) June 18, 2018
20 components but you need to fit them all in your head all the time = a complicated system.
Stopping things from knowing about each other is the big game in growing a system
This meant that rather than taking the somewhat traditional ‘horizontal slice’ approach to their monolith, where code was arranged in web layer, business logic, and data layers, the team would instead slice vertically by domain - meaning that Credit Cards slice would contain web, business logic, and data layer classes alike - but be entirely separate to the Customer or Inventory domains. This helped reduce congitive load on teams, because while picking up tech patterns can be hard, it’s nowhere near as difficult as building the model of your business domain and associated processes in your head.
This lead to some interesting technical directives - the team would actually prefer to copy & paste code rather than follow the ‘DRY’ principle, because it minimised the amount of coupling between these domain-specific modules. Given there was a desire for teams working on these modules to own them for years to come, the preference was always for shared ‘glue’ to be kept to a bare minimum.
If the first part of Scrambler’s talk was focussed on the technical challenges of their gigantic refactoring, then the second part of the talk very much focussed on the delivery aspects. The overarching lesson here was that they didn’t get everything right - and when they did, it took many goes to get there. They key would be to iterate until they found something that improved things.
To begin with, to tackle this gigantic project, they tried a team-per-subsystem approach. This didn’t work well, as it turned out the teams working on the BFFs would end up trying to pick up work that teams working on the relevant backend services weren’t yet ready to work on. The next step was to try ‘tribes’ made up of members of the different systems, working on a common task at the same time. This improved things, but it turned out certain parts of the overall architecture were more difficult than others, and remained bottlenecks for the tribes. To help solve this problem, Scrambler’s team turned to data - getting teams working with simple Kanban boards to help identify, escalate, and resolve bottlenecks as they appeared. From this, the team could measure cycle time of features and try and understand if they would succeed in hitting their desired finishing date.
From this data, Ken’s team began to see that work would begin to clearly fall into either a ‘fast lane’ or ‘slow lane’ - some parts of the system were simply far more complex than others. They would then split their projections accordingly, and refine their understanding of the rate of burning through their tasks. What also became clear with some of this complex work was their existing communication and leadership structures built around Principal Engineers and Architects simply wasn’t working. Huge bottlenecks existed at this level, Senior Developers were left somewhat unempowered waiting on them for decisions, and this was causing ongoing grief from a people and delivery perspective.
The solution to this was each team would nominate a developer to wear the hat of “technical lead” - a role they perform, rather than a job they held - and they would attend a new Tech Leadership Group that the Principals and Architects would hold to help drive alignment and understanding on different initiatives and decisions. This had the effect of these tech leads being confident in making decisions, and decisions were now made at-the-team, rather than at-the-top. The downside to this change however was tech leads were now much busier than before, and not all of them were able to handle the additional workload well.
Eventually, the organisation adopted the approach of Team Topologies and created some clear responsibilities among teams of different types. Stream Aligned teams would own the ‘verticals’ like Credit Cards and Inventory Management, including the associated BFFs, while Complicated Subsystem teams would own hard-to-change portions of the system that weren’t necessarily inclusive of all elements of their domain. New communication structures were also needed within the company to make this setup work, something that wasn’t necessarily an immediate winner and again required iteration.
A final learning that the team saw towards the end of this mammoth project was when they started a beta program for customers.
“It turns out people hate unfinished tax software.”
Overall, Scrambler and his team won the battle. The work came in a couple of months later than what the business ultimately wanted, but they achieved all the big ticket regulatory and delivery items required - and managed to descope a few things not immediately required for the ATO. It was however incredibly stressful - and demanded a lot of technical and organisational change that would have been much better planned and spread over a longer period of time. In his own words;
“Don’t try this at home.”