How to deploy QUICKLY and SAFELY to the live site WITHOUT comprehensive testing

By yuseferi, 3 October, 2016

On the one hand, you want to deploy changes to the live site QUICKLY (for, say, a Highly Critical security update).

On the other hand, you want make changes SAFELY, ie. you don't want it to break the site.

Testing is good. Automated testing is great.

But what if you simply didn't have the resources to comprehensively test the change (either manually or automatically)?

Maybe the client isn't willing to fund a project to write automated tests. Maybe you don't have the extra time or extra people to do proper QA. Whatever reason, you just couldn't do it.

Is it possible to both quickly AND safely deploy to the live site WITHOUT comprehensive testing?

... besides just crossing your fingures and hoping it doesn't break?

We think there is a way.

Comprehensive testing is best... if you can afford it!

So, just to be clear, we're not saying that skipping comprehensive testing of changes is the BEST course of action.

If you can afford to have a QA team take the site through it's paces, or can invest in creating comprehensive automated testing for your site - do it!

However, the vast majority of existing site don't have existing automated tests, and it can take months or years to build up a comprehensive test suite. And if the client is only paying you a small fee for security updates, you might not be able to justify having people do it manually.

We take on maintenance and support for all kinds of sites, in all kinds of states and with interesting histories. We can't depend on a site having documentation or even a person on staff who knows how the site works, let alone automated tests!

So, that's one of the reasons it was critical for us to come up with a solution to this problem.

Letting updates wait forever isn't a solution

One "solution" we've seen to this problem is requiring client approval before taking a change live.

This puts the responsibility of any issues with the update on the client. If there's a problem, "well, they tested it and said it was fine! Not my fault..."

However, not only is that a little unfair to clients who aren't testing experts, but sometimes it'll lead to updates stalling for days or weeks waiting for client approval. Clients are busy and have their own daily priorities and challenges to deal with.

If it's a security update, time is of the essense! And even if it isn't a security update, it's still good to move forward quickly. Pending updates can hold back other development, and if they find a bug 3 weeks later, it'll be a lot harder jump back in and debug it than it would have been 3 days later.

So, this isn't really a true solution to the problem!

Deploying security updates the same day they're released

We take security seriously.

On all of our Drupal support and maintenance plans we perform security updates for customers on the same day that they're released.

Frequently, people ask us how we could possibly do that safely!

So, without further ado...

Are you ready? Here's our solution...

This is the idea we've struck on that's working really well for our business:

Gather "3 critical use cases" for each site

These are three things that users or visitors can do on the site that must always work. For example:

An anonymous user visits the home page and can play the video
An anonymous user visits example.com/stories and sees a list of stories with a thumbnail, title, author and description
A user with the editor role can create and edit an 'activity' node and view the resulting page

If one of those use cases is broken, it is considered a "critical bug" which is given the highest priority. We have our customers sign off on the critical use cases in our contract with them.

When deploying updates, we'll test the critical use cases on a staging site, and if they pass, we deploy (usually - see below). If the changes break other stuff, it's still a bug and we'll still fix it, but by definition, those bugs aren't "critical."

We've found that this strikes a great compromise between SAFE and FAST: we can be sure that a change at least doesn't break something critical, while also getting changes out quickly.

Also, under normal circumstances, most clients are terrible at prioritizing bugs - they want to say that EVERY bug is critical! However, if everything is critical, it's the same as nothing being critical because all tasks have the same priority.

However, we've found that it's easier for our customers to think through critical use cases in advance. And then signing off on them, while largely symbolic, helps to solidify the idea that we've sat down and talked this through in advance and agreed - so there's no ambiguity about priority when an issue comes up.

NOTE: If a customer requests a special workflow that involves sending the changes to their QA department, approval or running their test suite, we're happy to accomidate that! But that means we might not be able to deploy right away and there could be delays that lead to the problems discussed earlier.

Please steal this idea!

This idea doesn't involve any technology - it's really social and proceedural in nature.

(Although, you can use a little technology to scale it by putting the critical use cases in Behat - which is actually something we do in our service. :-))

While it could take some negotiating in order to retrofit to existing relationships, it's relatively easy to implement. And I think it could be adapted to many different situations, including internal development teams, not just Drupal shops who work with clients.

Nearly everyone that I've explained this idea to has responded with "wow, I might steal that!" and I encourage you to do the same!

What do you think? Is it possible to strike a compromise between releasing SAFELY and QUICKLY? Do you have another idea you've used which works? Write a comment!

(+)