Hello everyone, and welcome to the Sprkl tips & tools series. In our interview series, we host a prominent developer each time and explore topics that would bring value to the developer community.
I’m Lee, and I do all that content fun stuff here at Sprkl Personal Observability, and I’m the person who asked all these questions.
Sprkl is a Personal Observability platform that provides personalized feedback on code changes while coding in the IDE. We help developers ship correct and efficient code while spending less time on debugging and frustrating rework. Powered by OpenTelemetry, Sprkl instruments every code change and analyzes it upon execution.
This time we interviewed Valentina Servile, lead software engineer at Thoughtworks, a leading technology consultancy. Valentina gave us insightful tips on enhancing developer productivity operating in complex distributed environments. The interview with Valentina was very insightful for us. We hope you’ll also gain some value from it. 🙂
My name is Valentina Servile, technical lead at Thoughtworks. Before Thoughtworks, I worked for another Agile consultancy.
I have focused on agile practices, clean code, automated testing, maintainability, and good software craftsmanship throughout my career. I am focused on using technical quality as an enabler for our clients to release software faster and respond to market changes.
As a technical lead, I also have the additional responsibilities of guiding our products towards a good architecture that meets cross-functional requirements, ensuring a good quality strategy in place, and building a safe space so that teams can perform to their full potential.
I have worked in 3 countries and for several client organizations of all sizes: from startups to enterprises. But most commonly, medium to large organizations.
Sometimes I rant about all of the above on my blog: https://oooops.dev.
I am trying Mastodon these days, so you can find me there at vse[email protected] or on the bird app at @EsseValentina.
I have worked in all team structures, but the most common form for a product team is 4 to 12 developers, a product owner, a technical lead (myself), and a business analyst. Sometimes we also have a QA role, but that depends on the client. Usually, with or without a QA, the developers take care of most of the test automation work. So our leverage in terms of developers is mixed.
Our technical teams usually consist of 2 or 3 seniors. Still, we also love working with junior devs and graduates because they bring fresh perspectives and cohesion – avoiding the “too many cooks in the kitchen” syndrome that might otherwise happen with many opinionated senior folks.
We usually work with pretty complex codebases. Our teams oversee one or a few microservices within much larger ecosystems made up of many other teams. When a tech company reaches a specific size, it will eventually work with complex distributed systems, and those usually start from a monolithic architecture. These companies tend to be our clients. I’d say our codebase is around the 5 to 8 complexity range.
In addition, the products we build get a lot of traffic. So, making sure that they’re scaled and resilient to user activity is a priority – regardless of the technology (good old EC2 instances or more sophisticated container platforms). But cross-functional requirements, like consistency, and availability, are specific to each application and negotiated on a case-by-case basis depending on what the application needs to do.
We value the independence of our teams and developers, so we keep our codebases in separate repos with separate pipelines to production. In our repos, we work with Trunk Based Development: we value sharing context and pair programming over gatekeeping changes through code reviews. It leads to a smoother workflow and better code delivery. Sometimes we cannot adopt full Trunk Based due to auditability constraints, but we still do pair programming and try to keep our feature branches as short-lived as possible.
In short, we want to integrate our changes as often as possible, making them smaller and safer to merge and deploy.
When we write code, we usually work in pairs that commit directly to the main branch. We have CD pipelines that take our code from the remote to production, and we want to roll it out to users as fast as possible once it has passed all the tests. This means that all our commits need to be considered as a potential release candidate that could go into production with very short notice.
Pair programming helps keep code production-ready by ensuring two sets of eyes continuously review it. Still, it is not enough on its own: we also do Test Driven Development (TDD on all of our code to ensure it is well-designed. And the automated test coverage is very high throughout the process. With TDD, we write the test coverage before writing the code.
On top of unit tests, our developers write automation at higher layers of the test pyramid: UI tests, service tests, and contract tests are all done as part of the development process – not by a different QA silo later on. This helps ensure we’ll catch regressions when commits go to trunk (and to production shortly after). We cannot afford to postpone the test automation until after a task is “dev done” if we want to release changes quickly and safely.
Being diligent with Continuous Integration also means merging our changes very often, even several times before a task is complete. This leads to the issue of having to release commits that have incomplete functionality in them, which the users should not see.
To address this, we launch feature flags early in development and hide the incomplete functionality behind them, so it won’t be visible when the WIP code goes live. We also use the parallel change (or expand and contract) pattern to make incremental changes if we are refactoring something. This allows us to keep the codebase in a green, production-ready state – even when it’s full of half-finished bits and pieces.
Once all of this has gone through the pipeline, and the automated tests have run, we can be confident that there won’t be application-breaking regressions in the release candidate. From there, some teams have a final manual approval for production, while others use Continuous Deployment to automate that. Regardless, changes in our teams tend to go to production at a pretty fast pace. Which I am very proud of and our stakeholders love.
When changes go to production quickly – Observability plays a crucial role! Because we consistently push changes to production, engineers need to look at the health of the live system on a well-made dashboard. We keep monitoring the traffic for errors and unusual patterns. APM and logs are useful for debugging the live system.
Some teams who deploy to production may perform automated canary roll-outs. This means the code changes will go live only on a few instances at first and automatically revert if the metrics say something went wrong. But those only work when you have a lot of instances, and they are not trivial to implement. It would be an interesting step to make our deployments to production even safer, but we have yet to feel the need for it. Simpler health checks have always been enough for us, as the test coverage catches most breaking changes.
As I explained in the previous answer, our developers run and write automated tests throughout the development process. Still, there is always the need for a few manual checks before we turn a feature toggle on for real users to see. We try to perform manual checks in production because it guarantees the highest assurance that the changes will work when possible. Production is our most “prod-like” environment, after all.
Most feature flag frameworks allow a flag to be active only for a particular user or session, so that’s what we leverage to let only a few people see new features. If we can’t do that because a change impacts some critical data or infrastructure, then a solid pre-prod environment is the next best thing. After completing an exploratory test, we might release the functionality in different ways. If it needs to be proven successful, we turn it on only for a percentage of the users or with an A/B test. But some changes like bug fixes or minor improvements can go live immediately.
We try to receive feedback throughout the development process: automated tests in our local machines and the pipeline let us know if we have introduced any bugs. But our developers might need to understand the requirements – and that is usually where our stakeholders help us by providing desk checks, demos, and exploratory testing before sending something live. However, the most critical feedback we get is from our users, who respond to the value and usability of the features through their behavior. We measure that with metrics like click-through rates, bounce rates, and overall engagement with our products.
Working in highly interconnected distributed systems does place a high cognitive load on our developers and does affect the developer productivity, especially our most junior ones. Moving fast in a complex environment where it’s so easy to break something can be overwhelming. We accept that mistakes will happen, and our focus is on catching them early and fixing them quickly rather than trying to achieve absolute developer perfection (which is impossible). We strive to build a safety net of practices and automation that makes it okay to fail and easy to recover. That’s why we value continuous feedback tools like pair programming, thorough test coverage, automated pipelines, and short iterations from development to production.
We don’t measure developer productivity in terms of lines of code, tasks completed, or anything like that. Our leadership is responsible for supporting the team, including regular check-ins that help us understand when someone is struggling. That is a big part of my job. Pairing helps as it reduces isolation and allows everyone to work with everyone, highlighting issues faster than if developers were sitting alone with their headphones all day. So the whole team has a chance to keep each other accountable for the quality of work.
But if COVID has taught us anything, not everyone can work at their 100% all the time. So having a non-judgemental space for team members to express when they aren’t feeling their best means that the rest of the team can step in and help. That way, we can ensure our products are still well taken care of and that our people have some breathing room.
We recently launched our Sprkl for Ci product to shorten your code review processes. Check it out on VS Code marketplace.
Enjoy your reading 12 Min Read