Working Blind: The importance of developer observability

Working Blind: The importance of developer observability

Daniel Beck
Daniel Beck 11 Min Read
Working blind: The importance of developer observability when developing software in modern software systems

Key Takeaways

  • In modern software development, you need tools to help you examine your work while coding to inspect any issues quickly. 
  • Once applications go into production, developers are at risk of working blind. 
  • Identifying and debugging a problem that is only happening on someone else’s machine is a challenge.
  • Developer observability tools capture a tremendous amount of useful information but do not always offer the most intuitive ways of surfacing that information.
  • Collect observability data thoughtfully, don’t turn observability practices into surveillance.
  • Dev tools will continue to grow in power and available functionality and with more targeted tools.

The bad old days

From the invention of javascript in 1995 to the release of Firebug in 2006,  the only way to debug your client-side code or design was trial-and-error: if something wasn’t working or didn’t look right, you’d change something, see what happened, and keep trying until you either figured out the problem or, well, didn’t.

We’re not talking about littering your code with `console.log` instead of using a debugger – there was no console. There was no DOM inspector. There was just the rendered page, your source code, and an F5 key worn smooth from constantly reloading the page after every change.

We could get away with this, mostly because at the time, what was happening inside the browser was much less complicated than the activity on the server: the browser was just for layout and maybe some light interactivity. Anything significant that might go wrong with the application logic would show up in the server logs, where you could track it down later.

Developer observability in modern software development 

These days, when your entire application is just as likely to be running inside the browser with the server doing little but handing out data through an API, that’s not good enough. While you’ll still run into the occasional novice whose idea of debugging consists of using the `alert()` box, few people would argue that that’s the best or even a viable approach. 

In-browser developer tools started out as little more than a console for logging ajax requests, but over the years have grown into a robust set of tools for examining our work while we’re working on it: you can easily inspect any DOM element to see which CSS rules are affecting it and why; you can pause your code while it’s running to see or even modify its internal state; measure performance directly to identify problems, even use higher level tools built for specific frameworks to examine for example React component state.

And these tools will continue to grow in power and available functionality both within the browser and with more targeted tools.

But once that application goes into production, you’re working blind again

If you have ever found yourself asking an end user if they know what a developer console is, and if they know how to open it, then you know how challenging it can be to identify and debug a problem that is only happening on someone else’s machine. “Hit F12 and try to describe what you see” is a frustrating experience for the user and the developer both – but what are worse are the problems you never even find out about, either because it’s only partly broken and the user assumes it’s supposed to be like that, or because it was so irretrievably broken they abandoned your product altogether.

If you’re finding yourself in that position, you probably want to consider investing in some developer observability tooling for your app.  Sometimes called “real time user monitoring” (RUM) or “digital experience monitoring,” at heart this is, basically, the modern front-end equivalent of those old-school server logs: tracking client-side errors by logging them as they occur and providing detailed information about the error, such as the browser and version, the location of the error in the code, and the user’s actions leading up to the error. This information can be used to identify and fix issues quickly, improving the user experience.  

I’ve focused mostly on the front-end part of the situation so far, as it’s more frequently the missing link in observability – but it’s important to note also that these tools generally come paired with equivalent monitoring for the server side: “application performance management” (APM) is a common term for the tooling used to keep an eye on potential server-side issues (such as slow database response times or outright errors), as well as to correlate data across the client and server sides of the equation.   

While RUM tools are comparatively new,  APM has been around in some form or another from the beginning: back in the day this looked like grepping `/etc/httpd/logs/error_log` but I’m happy to say the tooling situation has improved since then!  The best situation is to ensure you have full observability of both sides of the equation. Just examining your server logs, for example, might expose that a particular API request is failing – but with combined RUM and APM tooling, you’ll be more easily able to identify that the source of the problem is actually client-side code generating an invalid request.

In principle, the most basic form of this monitoring is an afternoon’s coding exercise: catch all errors, post error messages to server, job well done! But in practice, there’s some nuance to making sure you’re not missing anything: monitoring software may not be as useful if the errors you’re monitoring for can prevent the monitor from running.   

For any nontrivial application, unless you are extremely budget-constrained I’d suggest investing in one of the existing developer observability tools purpose-built for this sort of thing – they’ll have more robust data capture than you can easily build yourself, can scoop up lots of additional useful information along the way (which user had the problem? What browser, on what operating system, from what region, etc?) and will also generally include search and visualization tools for searching through all that captured data, flagging newly discovered issues for your developers’ attention, and generating all the charts and graphs your management team could ever desire.

Sifting through observability data: uncovering relevant insights 

The developer observability tool I’m most personally familiar with – I’m going to choose not to name and shame because, from what I can tell, this is a common problem – while it captures a tremendous amount of useful information, it does not always offer the most intuitive ways of surfacing that information. The UX has, let’s say, a steep learning curve. At my last org, we had to make conscious effort to remind engineers actually to dig through those reports regularly and to train them up on how to locate the relevant data for a specific user complaint you can set alerts for major outages, but tracking down the right information to explain a problem only one or a handful of users were having wasn’t always the easiest.

The fact that we were responding to user-reported errors and then trying to find the explanation in the captured data is already a sign of this not working as well as it could:  ideally, we would have been identifying and resolving these issues without the user needing to report them to us. Better than not having the data at all, of course, but the tooling is not the whole story; it takes care and diligence both to configure these tools usefully and to use them effectively. As above, monitoring software is maybe not quite as useful if nobody’s looking at the monitor or if the monitor is capturing so much unneeded data that it becomes difficult to identify what’s significant.

In a related vein, I recently encountered Sprkl, a Personal Observability platform that offers automated logging, tracing, and personalized feedback on code changes for individual developers.  It minimizes time spent on debugging, log searching and digging for relevant data. Each dev only gets their personal, relevant info instead of the tremendous amount of data collected and presented by “regular” developer observability tools. This isn’t a replacement for observability in production, but it’s a very useful supplement, and I find it exciting to see tools like Sprkl expanding these ideas into new territory.

Collecting observability data thoughtfully

 It’s important with any data capture tool to use that data responsibly and with sensitivity to both legal and human concerns. Once you’ve built in the tooling to capture errors, it’s pretty straightforward to also capture information about the user’s session that aren’t errors – up to and including, well, literally every action the user takes. Used properly, this can be tremendously useful: you can see which features users prefer and how they use them, which parts of your application load too slowly, where users give up on your sales funnel, what info they type into every form field… and at this point hopefully the PII issues here are obvious: particularly if your application handles financial, medical, or other sensitive data it’s extremely important to filter out info you don’t want to be capturing and to control who in your org is able to access what you do capture. In all cases you need to make sure your Terms & Conditions have the proper disclosures about what you’re capturing and how you’re using it. I’m not a lawyer, don’t ask me; talk to someone in Legal at least once before you start playing with these tools, is what I’m saying.

And even with the data that’s not inherently sensitive, it’s important to make responsible use of it. That customer on the support call who’s complaining that they’ve been struggling to get past an issue in your product for HOURS and HOURS – it might be personally satisfying to tell them that you can see in the logs they tried twice and then gave up, but the customer is probably not going to be happy about that interaction even after you solve their issue. Basically, don’t let your use of these tools cross over from observability into surveillance.

Developer observability: Conclusion 

The work we do these days is too complex and too important for it to be feasible to simply throw your code into production and hope it works.  Depending on user bug reports guarantees you’re not seeing the problems in your code early enough; for any user who took the time to contact your support team, there are probably dozens or hundreds who encountered the same problem. Developer observability tools are not magic and they have some drawbacks – existing tools can be complex to configure, difficult to use effectively, and are frequently expensive – but they’re an essential part of the modern developer’s arsenal.

About the author

My name is Daniel Beck; I’m a UX/IA designer turned front-end developer turned engineering manager, with extensive experience working with fully-remote and hybrid remote/local teams. I work best in situations involving a lot of uncertainty, change, and growth — which can mean a startup transitioning to a real live grown-up company, a departmental reorg, an older product in need of substantial rework or redesign, or a greenfield project in need of scoping and definition.

Building an engineering team is not so different from building a user-facing product: in both cases the goal is to help your users (developers) get their work done via predictable and intuitive workflows (development and planning processes) subject to the constraints of budget / time / physical reality — and to make it all attractive and pleasant enough to work in to attract more users (or team members) in the future. Here are 37 tips from me on enhancing productivity in development teams.

I’m good at fostering a sense of teamwork and collaboration, both within my engineering teams and across disciplines: my past experience as a UX designer and IA specialist helps me bring your designers’ vision to reality, and my experience scoping and planning projects as an independent contractor helps me communicate well with both your product management team and with external vendors. Check out my website.

If you want to give Sprkl a try, get started here.

Share

Share on facebook
Share on twitter
Share on linkedin

Enjoy your reading 11 Min Read

Further Reading