How to improve developer productivity: Tips and tools series

How to improve developer productivity: Tips and tools series

Rohan Verghese
Rohan Verghese 10 Min Read
feature picture

Hello everyone, and welcome to the Sprkl tips & tools series. In our interview series, we host a prominent developer each time and explore topics that would bring value to the developer community.

Sprkl is a Personal Observability platform that provides personalized feedback on code changes while coding in the IDE. We help developers ship correct and efficient code while spending less time on debugging and frustrating rework. Powered by OpenTelemetry, Sprkl instruments every code change and analyzes it upon execution.

This time we interviewed Rohan Verghese, software backend developer at Amazon; Rohan gave us insightful tips on enhancing developer productivity operating in complex distributed environments. The interview with Rohan was very insightful; we even got some insights into Amazon’s R& D teams. We hope you’ll also gain some value from it. 🙂

Tips from Rohan on developer productivity in development teams

Developer productivity

  • Get feedback quickly – Decrease cycle time as much as you can. 
  • Faster builds, faster tests, faster code review, faster deployments.
  • Beware of mandatory software vulnerability checks in CI. Though seemingly beneficial, they can decrease productivity, such as a 10-minute increase in deployment time.
  • Prioritize fixing a broken CI/CD pipeline. Accumulated issues and deployment/testing disruptions result from delays in fixing it.
  • Aim for reviewing code within 24 hours. 
  • Prioritize timely code reviews. Do them in the morning and after lunch to minimize disruptions to work.
  • You should be able to quickly set up a local version of their service with just one command.
  • Set up mocks of your dependencies where you can control what responses are returned.  
  • Sleep on it! Sleep on non-trivial tasks to allow the unconscious mind to process initial solutions.

Delivery processes  

  • The delivery process’ blind spot is unstructured data from non-relational data sources, leading to difficulties in managing it.
  • In distributed software development, starting with the data instead of the top level API can improve the delivery process.
  • Relational databases and related concepts simplify code by promoting efficient data organization.

Testing: Locally, CI, Production

  • You should have good unit tests that run quickly on every local build. (At the very least). 
  • It is ideal to have automated integration tests that run against a test environment as part of the CI process.
  • In production, use  “canaries” to monitor sample data and workflows to detect serious problems if activity drops to zero.

Debugging processes

  • In distributed systems, you need to rely on the pillars of observability. 
  • Use metrics to detect potential issues and traces to pinpoint the source of the problem, especially concerning resources like memory, network, or dependencies.
  • Use logging for debugging logical issues, specifically logging inputs and return values from dependencies.

Insights into Amazon’s R&D teams

  • Amazon is pretty famous for its “two-pizza” teams. (2 pizzas feed 6-10 engineers) who own a service or two e2e. 
  • Amazon code is simple; figuring out how your service fits into the greater web of services and communicating with other services is complex.
  • Amazon is a document-driven organization, with written documents playing a significant role in design decisions and meetings.

Please introduce yourself

My name is Rohan Verghese. I’ve been a software developer for about 16 years now. I’m currently at Amazon, where I have worked in the supply chain area for about a year. Before Amazon, I worked at Bally Interactive, an online sports betting platform, in their third-party feed team. Before that, I spent ten years at a very small startup that made a hedge fund administration product. By very small, I mean that I was the only backend developer for the first five years!

Please describe Amazon’s R&D department structure.  Specifically, your team?

Amazon is pretty famous for its “two-pizza” teams. Each team is 6-10 engineers (i.e., can be fed by two pizzas) and owns one or more entire services end-to-end. Data science teams are also organized along similar lines, tackling individual projects. However, I’m not sure there’s a traditional R&D structure.

On a scale from 1 to 10, how complex do you think your code base is, and how does it affect developer productivity?

It depends on how you look at it. The actual code is probably a three or so. It very rarely does anything tricky. But your service will probably call six other services, and multiple services will call yours. So organizing all that and making sure it’s resilient and highly scalable with all the different AWS constructs and options is where the complexity lies.

To sum up, the basic code is relatively simple. However, the complexity lies in figuring out how your service fits into the greater web of services and communicating with other services.

Ironically, previous jobs had more complex code simply because the domain was more complex. There were days when I cursed the crazy people who invented cryptocurrencies.

Can you describe your development process?

It’s usually left up to the individual team. Most teams I’ve seen follow a basic Scrum-like process with two-week sprints. Sprint planning at the start and regular standups during the sprint.

The only major day-to-day difference at Amazon from other companies I’ve worked at is that Amazon likes writing documents. People write documents for almost everything. Pretty much every design decision and meeting will have an engineer write a design document about the subject of the meeting, with all the pros and cons listed out. You spend the first 10-15 minutes of the meeting reading the document, discussing it, and making a decision.

On the whole, I like this culture of writing documents, but there’s no denying it is a pretty big culture shock when you’re first introduced to it.

Where do you think the blind spot in your delivery process is? And how it relates to developer productivity?

Some teams who deploy to production may perform automated canary roll-outs. This means the code changes will go live only on a few instances at first and automatically revert if the metrics say something went wrong. But those only work when you have a lot of instances, and they are not trivial to implement. It would be an interesting step to make our deployments to production even safer, but we have yet to feel the need for it. Simpler health checks have always been enough for us, as the test coverage catches most breaking changes. 

In my opinion, the biggest blind spot is in structuring data. Because of scalability concerns, most teams use non-relational data sources like DynamoDB or DocumentDB (MongoDB). But because your data is unstructured with these data sources, it’s very easy to use a data shape that is not quite right. That ends up making dealing with your data more difficult than it needs to be.

I find the structure imposed by relational databases, normal forms, relationships, and constraints lead you to shape your data in a superior, more correct fashion. Which in turn often makes your code a lot simpler and straightforward.

In distributed software development, many teams start the delivery process with the top level API, which other services will call, basically the code. However, I think you should always start at the bottom with the data that will be stored and/or operated on by your system.

At which phase do you perform testing? (I.e., local, CI, production)

All of them? At the very least you should have good unit tests that run quickly on every local build. Ideally, there would be automated integration tests that run against a test environment as part of CI. 

Finally, in production, it’s often a good idea to have “canaries,” sample data, and workflows that your system processes regularly so that there’s always some activity. Then if activity drops to zero, you know there’s a serious problem.

How do you trace back an issue? How do you debug them in each stage? 

In distributed systems, you need to rely on the pillars of observability: metrics, logs, and traces. Good use of metrics will alert you to potential issues early. Traces narrow down where in the system the problem is occurring. These two pillars are most important to determine problems with resources like memory, network, or dependencies.

The most important tool to debug logical issues is logging. Therefore, it’s imperative to have good logging of inputs to your service and return values from your service’s dependencies.

To be honest, I don’t think there’s any real substitute for good logging. If you know all the inputs that caused the problem, it’s easier to write test cases that reproduce it.

Do you measure developer productivity, and if you do, how?

I’m sure productivity is measured at some level at Amazon, but it’s above my level.

I suppose you could look at throughput (number of tasks completed per unit time) or latency (average length of time it takes to complete a task). But most of these measurements get confounded by the fact that every task is different.

At a meta-level, a productive team meets its deadlines (set by the team, not others). At the very least, such a team knows how productive they are, and there is no external interference driving down their intrinsic productivity. On the other hand, an unproductive team is constantly slipping, which is very often a sign that external factors are interfering.

Can you list five general tips for increasing developer productivity? 

  • Decrease cycle time as much as you can. Faster builds, faster tests, faster code review, faster deployments add up and allow developers to try out ideas and get feedback quickly.
  • I remember at one of my previous experiences, a mandatory check for software vulnerabilities was introduced into CI. In theory, it was a good idea. However, it took a 2-minute deployment to 12 minutes and decreased productivity tremendously.
  • First priority should be your CI/CD pipeline. If it breaks, your team needs to fix it immediately. A broken pipeline often accumulates issues and can be a real pain to fix later. Not to mention that it’s usually preventing people from deploying and testing.
  • Second priority should be code reviews. I suggest a practice of doing code reviews first thing in the morning and first thing after lunch. So you aren’t interrupting your regular work for reviews but they are being done quickly. Code should be reviewed within a day.
  • Developers should be able to quickly set up a local version of their service with just one command. This can take a fair bit of work, especially for very distributed services. 
  • This can seem unnecessary, especially when you already have a shared test environment that connects to other test services. But this kind of local full service is worth its weight in gold. It’s much easier to test branches locally when you don’t need to deploy to a shared environment. 
  • Sleep on it. For any non-trivial task, give yourself a night’s rest before reviewing it. Let your unconscious mind chew on your initial solution. You might realize issues with your approach, or be struck by a better solution, or come up with corner cases that might need better testing. Or maybe you’ll be happy with your solution and can put it up for review right away.

If you want to give Sprkl a try, get started here.

Share

Share on facebook
Share on twitter
Share on linkedin

Enjoy your reading 10 Min Read

Further Reading