Introducing COSTAR Code Reviews
COSTAR is an acronym for the checklist I use to establish what needs to be covered in code reviews. It stands for the different aspects of code that I expect everyone to evaluate. When you give a passing score on a COSTAR code review, you are certifying that you looked at the author’s work and agree it meets all of the COSTAR criteria. You are tying yourself to the quality of the code just as much as the original author. If the author was the star of the code, you’re now the co-star. Both of your faces are now on the poster, and you both solemnly swear the code is…
Clean
Optimal
Secure
Tested
Accurate
Resilient
It is an acronym that is too cute by far, and while it feels really dumb when you first start using it. It becomes pretty natural to start talking about COSTARs. “Do you have a COSTAR on this ticket?” “We can’t deploy this yet, it doesn’t have a COSTAR.” “Hey, can you COSTAR this with me?” etc. etc.
Having a very explicit list of what is considered in-scope of a code review helps people new to the process feel empowered to point out things they might not have thought to review, and helps stop the rubber stamping phenomenon where people just start passing reviews so they aren’t blocking anyone else. A “code review” could be anything. A quick one, a bad one, a long one, a great one, a tedious one. You aren’t really saying what exactly you did when you tell me you “reviewed” someone’s code. But if you tell me you COSTARred the code. You’re telling me something.
For the kind of person who turns every code review into a Kafkaesque nightmare designed to extract the maximum amount of pain from anyone foolish enough to think they could get any code past them, COSTAR can become another cudgel in their arsenal. Shocker: a cute acronym will not cure for pedanticitis. But trust me when I say the point of COSTAR is not to grind all progress to a screeching halt, make code reviews take forever, and ensure your most insufferable reviewer gets his quota of junior developer tears. The point is to ensure code reviews are meaningful and actually improve code quality.
There is a lot to do with a team around trust and culture for any kind of feedback process to be effective. COSTAR won’t work if no one feels safe giving or getting feedback. But for now, I’m going to hand-waive this important aspect of establishing quality code reviews so we can get into the details of the checklist. Maybe we’ll cover some of that in future posts.
Clean
The cleanliness of the code is about the format and structure of the text itself. This can be subjective from place to place. Here is where you enforce naming conventions, style guides, etc. Ideally much of this is checked automatically by a linter. But when a human is doing the review, they should take a minute to consider cleanliness of the code. For my COSTARs I want them to ask…
- Is it easy to read/maintain?
- Does it follow the Clean Code guidelines?
(here I am quite literally referring to the Clean Code book by Robert “Uncle Bob” Martin of which I am an advocate. I know some people look down their noses at it, shouting things like “how could a book written so long ago, that uses a language I don’t for all its examples, still be relevant? Obviously good ideas age out and we must reinvent everything all the time!!” to those people I say “get off my lawn and turn that blasted music down” but fine…you use whatever definition of clean works for you.)
I especially try to avoid Uncle Bob’s causes of “Code Smell”
- Rigidity: It is difficult to make changes, one change would cause a cascade of changes elsewhere.
- Fragility: Multiple processes would break if one thing went wrong or changed.
- Immobility: You cannot move or copy parts of the code because of involved risks and high effort.
- Complexity: The process has more steps, parts, etc. than needed, AKA over-engineering.
- Repetition: Code is not DRY (Don’t Repeat Yourself), logic or values are repeated in multiple places.
- Opacity: It is hard to understand what the code does just by reading it.
Comments are also part of code!!! They should also be reviewed and…
- Ideally not exist at all. They should only be present where code is not obvious on its own. Can you improve the code to make the comments unnecessary?
- Be in plain English that people new to the code could understand: no non-universal acronyms, abbreviations, or jargon
- Explain intent not function (why did you do it, not what it does)
- Be accurate for the current state of the code
Optimal
By optimal I mean the code performs well. Appropriate algorithms, structures, objects, etc. are in use to maximize the efficiency of the code. If you want to start having big O notation arguments, here is where you do it. Again, there are many tools to automatically assess this and I definitely advocate for automating as much as you can. But when a person is doing the COSTAR, I want them asking…
Do any user-facing actions respond to the user within 5 seconds?
(Lots of research finds between 1 and 10 seconds is where users start thinking something is wrong and start clicking over and over or just leave. Here is one such oft-quoted article from Jakob Nielsen)
Does this code introduce any obvious performance bottlenecks?
Is this the best implementation of the idea the team can come up with? If not, and we need to get it out anyway, are the improvements needed to get to “best” added to the backlog?
(Here is where we try to cut off the Kafkaesque reviewer. If the code functions and we’ve got to ship, they can’t just hold the code hostage until it meets their definition of perfect. But at the same time, let’s acknowledge we usually only put up with this kind of person because they’re most often right, so give them an outlet for the feedback: a ticket on the backlog.)
Secure
Behind every cybersecurity breach is a developer who wrote the vulnerability…and I bet most of those passed code review. You have to be thinking about security while you’re writing the code that could eventually be exploited. As always, there are a lot of scanners that can help check for the basics here, but I want my reviewers to at least be looking for the OWASP Top 10
- Broken Access Control Users cannot perform actions they don’t need to (principle of least privilege)
- Cryptographic Failures Data is encrypted at rest and in transit
- Injection The system cannot execute code provided by the user (validation, sanitization, escaping)
- Insecure Design The original business request is not for something inherently insecure (question the requirements, not just the implementation)
- Security Misconfiguration All configuration settings, especially default values, have been evaluated
- Vulnerable and Outdated Components We are using the most up to date version of all libraries and tools
- Identity and Authentication Failures Users must authenticate to access non-public data, and we don’t help attackers (throttle failed attempts, do not reveal if users exist, error messages do not reveal your stack, etc.)
- Software and Data Integrity Failures We can validate the source of every part of our tool chain (libraries, tools, servers, etc.)
- Logging and Monitoring Failures All auditable events, warnings, and errors are logged with clear messages, those logs are monitored and alerts fire in near real time for suspicious issues
- Server Side Request Forgery Any request for a remote resource is validated (ie: user can’t trick the system to return a sensitive file by changing the URL of an image request)
Tested
As you might expect from a fan of Uncle Bob, I believe in Test Driven Design (TDD) and encourage developers to follow that process. This means we should definitely have some tests. Automated code coverage tools are useful here, but remember that most just check that there ARE tests, not that those tests actually test what they are supposed to. Tests are also code and need to be reviewed just as thoroughly as the feature code. For a COSTAR, I want them asking…
Is this code covered by a new or existing unit test?
Is that test going to be executed automatically at build time?
Is it covered by a new or existing end-to-end/integration test?
Is that test going to be executed automatically at deployment?
Accurate
Accurate here means the code does what it is supposed to do. You cannot review the code if you don’t understand the requirements. A COSTAR is not just about looking for missing semicolons (or whatever you Python developers look for). I’ll never understand places that let people review code for projects they’d never heard of before the review hit their to-do list. When doing a COSTAR you should be asking…
Does this code meet the definition of done and/or requirements of the ticket? Does it produce the expected results?
Resilient
Code is resilient when it can take a beating and keep on ticking. It is usually a relatively easy matter to write code that handles the “happy path”. The bulk of our development time ends up dealing with the edge cases that will break our code. Take a moment to consider some of the most likely sources of such issues:
Does it handle or prevent…
- Edge cases?
- Nulls and blanks?
- Year or month rollovers?
- Unexpected user input?
- Extremely large values?
Does it gracefully handle and log errors?
As I mentioned, there is a lot more to establishing a great code review process than just shouting this acronym at developers all the time. You have to have the right people, the right culture, the right tools, etc. And I may talk about those aspects here in the future.
This is also just a rough framework, you should define the criteria how it makes sense for your organization. Your definition of clean or optimal may not match mine, but you still probably want clean, optimal code. Make it your own. Or don’t, I’m not your mom.
All I’m saying is, I have generally found that once a good team embraces COSTAR, their code reviews get better, collaboration improves, and code quality goes up. If your code reviews feel aimless, vary wildly between teammates, and/or aren’t preventing bad code from getting into production, you may want to turn your reviewers into co-stars.