Quality, Technical Article

Less UI automation, more quality

December 12, 2018
Bill Hodghead

Have you ever thought, “Creating and maintaining all this UI automation is taking a huge amount of my team’s time? I’ll bet we could do less of this and get higher quality!” If so, you wouldn’t be alone. I talked to six product teams that not only thought that, they did it. Here are the lessons and automation-reduction techniques they learned.

The return on investment (ROI) for UI automation is changing

For years, UI automation has been a mainstay for many products. With long ship cycles, strong regression suites were needed. The product UI was built slowly and infrequently changed. Products often weren’t designed to be tested well from layers below the UI, so UI testing was the main way to verify functionality. Since the same tests would be run hundreds of times across many builds, languages, SKUs and architectures, it made sense to automate them.

Now, most teams are moving to short, agile cycles. When lean startup methods are used, the UI can change frequently to get something in front of the customer and then refine it. Many teams have more than one version of the UI in production at the same time for A / B testing. Interactions in the UI are becoming more asynchronous, making automation timing harder. Developers are separating business logic from the UI using MVC or MVVM designs. The lifespan of unmodified UI automation is much shorter, and the return on investment is much less than it used to be.

To summarize: in the past, most functional test automation was done near the customer interaction.

*_{Figure 1 – Test case distributions in Waterfall and Agile test pyramids}*

Now most functional test automation is done as close to the code as possible.

UI automation issues

Here are some of the downsides of UI automation based on feedback from the product groups I talked to:

Slows the team. It’s long to run.
Slows innovation. It increases cost of feature change.
It’s fragile and expensive to maintain. Asynchronous actions and interactions with other programs raise the cost.
Low return. It finds few bugs: for Team A, bugs found by automation were 10% of all bugs. Five or six times more bugs were found through tests or infrastructure.
Gives a false sense of security. Even when it passes, it misses visual and usability issues.
Slow ramp-up for new hires. It often has a very large and monolithic code base.
Failure analysis process is expensive: Every team reported this. For Team A, analyzing approximately six hundred web tests took ten work days every two-week sprint. The return on investment was very low.

Reducing automation

Team B reduced their UI automation by 85% — here’s how.

They did an experiment

They asked, “What if we just stopped running all the automated tests that have never found a bug? In a few months, we’ll run them all once (manually if we have to) and see how many bugs we would have missed.”

Result

85% of the UI automation was cut. Test run time dropped from 41 hours to 5.5 hours on a single machine. 55 tester work days were saved in maintaining automation code.

Risk

During those months non-team members found two bugs that would have been found by the cut automation. When the team finally re-ran the cut tests, they found three more bugs (at least one of those was something the automation wouldn’t have found but was seen by a tester running the test manually). The team decided they had cut slightly too far, and they added back three tests and one scenario, but kept the total suite small.

How does this raise quality?

They used those 55 tester days to explore other test efforts, such as manual exploratory testing and testing customer scenarios. Also, developer fixes were faster and better. Previously, the bugs would be caught after the developer who wrote it had switched to another task, so it was often fixed by someone else. This led to slow fix rates and high regressions. Now the bugs are found sooner and the developer who made the bug fixes it.

Which tests were cut, and which were kept?

P0 cases such as “launch the UI and login” were not kept. If a test is so vital, it’s already being tested manually, through developer unit tests and with other scenario tests. Many shorter, feature-based tests were cut, while the longer, scenario-based tests were kept. It turns out the longer, scenario-based tests found more bugs. They were also a little harder to maintain, but worth it.

Can I get rid of it all?

Don’t start!

Team C has a policy of “no UI automation” and it works for them. They have a 1.0 product with a UI that changes very quickly and has few dependencies.

They also have four developers / tester, so the development team has taken a lot of the responsibility for code quality. Developers write tests and test plans. Developers also do manual testing across different browsers (about one to two days per month).

They have a library of unit tests that they try to keep down under 5 seconds to run (when we talked it was 16 seconds and they were unhappy). With tools like NCrunch, developers can run the unit tests continuously while they write the code. That makes the life of a simple bug seconds! They fix the bug by hitting “undo” a few times till the tests pass again. No debugging.

Their dog food and self-host programs also find a lot of bugs when they are missed by the integration testing.

Kill that old code!

Team D found they could cut all their 10-year old UI automation and replace it with a set of API tests.

It started with an unfortunate discovery. A bug was found that the automation missed. When looking at the automation, the tester realized that their old UI automation didn’t really test very much, yet it was slowing down the test passes.

The new API tests took the tester one and a half weeks to write and they run in 10 minutes. The old UI tests ran in an hour. Team D has about 1500 tests. Maybe 20% of that remains as UI tests (and they’re itching to cut those down). For this product cycle they haven’t written a single new UI test. Some areas no longer have any UI tests at all.

Put tests close to the code

A key lesson from these and other groups is that the automation is most effective when it is closest to the code. The fast unit tests that the Team C has prevent bugs quickly.

More strategies to cut tests

Team A found several ways to cut their automation and incrementally improve the code quality.

After a recent reorg, the team inherited over two thousand automated UI legacy test cases, meaning that it must maintain almost 2600 test cases in total. This was not manageable for a small team. Because these test cases were not known to the team members, cutting tests was a hard task.

By applying the techniques below, the team deleted approximately 1300 tests (half their UI automation). The team is still looking to lower the number of UI tests even further, but at least now it is possible for one person to analyze and maintain all UI automation taking no more than 50% of their time. (They improved efficiency by a factor of four.)

Break it up

Break up longer tests into smaller modular chunks. Once you have your tests in smaller pieces, you can often remove the duplicates. The smaller pieces are also easier to maintain.

Make UI tests into unit tests

Simple tests can be moved down to the unit-test level. Writing code that is good, testable and isolated makes it easier to change the testing to use targeted unit tests that can run faster, be smaller and get better coverage.

Compare hotfixes to UI automation

It can be useful to compare the number of hotfixes to the number of UI tests. This can sometimes be proportional. A high number of UI tests could be, but does not have to be, an indicator of a poor design and poor testability and therefore a lack of solid unit tests.

Sometimes, a high number of hotfixes indicates the importance and the complexity of the area, but if this is the case, the action is simple — do more testing, do refactoring, make code simpler and add more unit tests.

Every test should have a unique reason for failure

Look for two tests failing from the same root cause. You might be able to cut one.

Raise the quality bar for your automation

Team A raised their quality bar for tests. Their bar was simple: If you cannot write a test that runs reliably in the continuous integration system, then do not write it! Every new test they added had to be executed 100 times without failure.

Their tests had a lot of failures, so to get more stable tests they ran each test hundreds of times in the lab. They set a total passing rate of 98% for the daily test execution (a 2% failure rate was deemed acceptable). Tests that were not stable were cut. They removed tests on very visible features where the bug would be found another way, and they also removed tests for lower priority areas. In all, they cut 80 to 90 UI tests (~25%). After deep analysis, it was discovered that many UI tests were exercising the same areas as several unit tests, and of those, 25% were replaced by no more than 20 unit tests. Once the UI tests were stable, 210 of them were added to use in the continuous integration system, raising the quality bar there without additional cost.

Do manual testing

They used a simple manual checklist for hard-to-automate areas.

Cut tests that never found bugs

Like Team B, Team A also measured how many product bugs a test found in previous releases. If a test did not generate a regression, it was cut. If a test found bugs, then there was more analysis to see if the test is a duplicate.

Use coverage

Run only the tests that code coverage says you need. For example, if a developer hasn’t changed the constructor, don’t run the constructor unit tests.

Validate the return on investment

The team discovered that it is very cheap to write an automated UI test, but it is expensive to maintain. For every code bug found by automation, there were two to ten test-related bugs. Every automated test also had the cost of building and maintaining infrastructure and tools required to run it. It may sound harsh, but it is worthwhile to cut all the tools that are badly designed and written. Tests written using these tools can be deleted as well. Unit tests require much less effort, only building proper abstractions and simulators for external components. The team (which had 10 engineers) saved up to fifteen work days for each two-week sprint.

Sometimes you still need UI automation

Team E had already cut back their automation, but lately they’ve found a few reasons to add some new UI tests. Even then, they were skeptical of UI automation value without conditions.

When to use UI automation

If it’s hard to test functionality at the API level or in production. For some complex combinations of business logic, a UI test may need fewer lines of code. For example, to take and email a picture might be easiest to do through the UI, while sign-in would be an API test because it has the same number of calls either way.
As an experiment or temporary measure. UI automation can be easy to prototype. Record / playback can be used to rapidly develop tests for a UI. A simple UI test can also be used to see if a test for an area would have value.
When doing heavy graphic processing.
Assisted manual testing (not full automation). Many manual tests (typically done by an off-shore vendor) need common setup and a little automation can make those parts faster and more reliable.
As part of end-to-end scenarios. When you have a true end-to-end customer scenario, part of it will be in the UI. A given UI rarely needs more than two to five of these tests.

Previously, 70% of the team’s cases were UI automation. Now, 90% of the team’s automation is at the API layer and 10% is UI. However, with a new and better automation library from an external vendor, the cost of UI automation has dropped, and it makes sense to have a few more tests at the UI layer again.

Hey, I can make this automation better!

Most product groups I’ve talked to tend to reduce their test automation investment when the cost of maintaining it goes up, but not all. Some choose to improve the automation to reduce costs while keeping the benefit. On Team F, the test developers, led by a devoted development lead, did a concerted effort improve test automation quality.

Here are some things they did:

They measured automation reliability by area and by test. Initial reliability was in the 80 percentile and that justified their impression that things were bad. Automated tests with low reliability were disabled or just deleted.
They did root cause analysis on all automation failures to find the best way solve the issues.
They adopted the same or higher levels of quality standards for test code that their developers had. For example: good and consistent naming, low complexity and unit testing for shared libraries.
They did code reviews or pairing on every change.
Tests had to run consistently 10 times in a row. Old tests that failed that bar were disabled and either reworked or completely rewritten. Changed tests had to meet this bar. This technique was also used by another product group above with the bar set to 100 times in a row.

The results were good, but also show the limits of the effort:

Their new API automation was 99.99% reliable. That was great. Even at 10,000 runs a week only 1 false failure might occur. This effort totally paid off and the tests were run constantly with every merge. A subset of the fastest tests was run on code even while it was being developed.

They could get the UI automation up to 99.5% reliability, but never higher. That’s pretty good if you run the tests 100 times a week. If you run the tests 1000 times per week, and it’s 30 minutes to investigate every false failure, then 99.5% reliability is costing you 2.5 hours of wasted developer time per week.

Before you think, “Yeah, but I could do better,” remember that most UI issues are timing issues that occur because of randomness in the test environment. You can do a lot with smart timeouts, but UIs are processor intensive and your UI under test may not have >99.5% reliability for the timing of every operation in an expected number of milliseconds given the action of other processes on the machine. I now assume that any UI automation will max out at 99.5% reliability and plan my automation investment accordingly.

If you want more detail on automation patterns that will improve your automation reliability, check out Matt Griscom’s book: MetaAutomation: Quality Automation for Faster and More Trustworthy Software Development. As far as I know, Matt wasn’t involved in any of the automation efforts I discuss above, but he’s been involved in plenty of others that have done the same thing and he has some good techniques.

Conclusions and best practices

Times are a-changing. That old flaky UI automation may not cut it with your new development processes. As you move to faster release times with agile development and componentized architectures, it’s time to question the ROI on your automation and consider some house cleaning:

Break the automation into small modular pieces.
Keep that automation close to the code being tested as possible. A data conversion test in an adapter is probably a unit test on that adapter, not a higher-level test.
Don’t duplicate your test code where possible. This sounds obvious to developers who’ve heard “don’t repeat yourself” as a mantra, but it really helps.

Use this test pyramid as a guide. If you make the tests small modular pieces as close to the functionality tested as you can and don’t duplicate, you’ll end up with a shape like Figure 2.

In successful companies I’ve seen, this is a really flat pyramid! Each layer down has about 10 times the number of tests as the one above. Don’t get too hung up on the number of tests, but if someone says, “We’re going to add hundreds of UI or integration tests,” ask a few questions about why those couldn’t be done with smaller faster component and unit tests.

Experiment. Cut some automation and measure the effectiveness. Cut flaky tests. Cut tests with low coverage. Cut tests that don’t find bugs. If you can cut all the UI automation and replace it with other techniques, do it!

What will you do with all that time saved? I’m sure you have ideas, but here are some starters:

Do exploratory testing. This isn’t just playing around with the software. There’s a whole theory and set of techniques on this — James Whittaker has good books and videos. Include customer scenarios in your list.
Help any manual scripted testing you still need to do with some supporting automation.
Spend more time checking for non-functional testing and compliance to standards such as security, globalization or performance testing.
Instrument your code and Test in Production (TiP) measuring real users and real environments. This is especially useful for questions like, “Where are my customers having slowness doing their work?” — a question that’s a lot more important and can be cheaper to answer than, “What’s the performance of different components of my software?”