A Rant About Software Testing Metrics

Don’t drown in metrics the internet tells you to prioritize; focus instead on metrics like Defect Removal Efficiency (DRE) to drive product quality.

TABLE OF CONTENTS

What are we measuring
Final Thoughts

What are we measuring?

I recently tried DeepSeek. One typical pattern I see with all GenAI tools is that they provide really long lists of items in certain subjects. Granted, you can improve your prompt and ask for the top 10 or something similar. Still, it is helpful to see what it comes up with broad questions as it gives you a distillation of what the internet thinks of a specific subject.

When I asked about software testing metrics, DeepSeek did not disappoint; it offered a long list.

Deepseek provided:

3 defect metrics
3 test coverage metrics
3 test execution metrics
3 test efficiency metrics
2 defect aging metrics
2 requirements stability metrics
3 customer satisfaction metrics
3 release quality metrics
3 code quality metrics
3 performance metrics
3 security metrics
3 agile metrics
3 maintainability metrics
3 usability metrics
3 regression metrics

It looks like the code has a hardcode that does not give more than three for a specific category.

The fantastic part is that in this long list of metrics, the metric I use and recommend the most is not present. I decided to give DeepSeek the benefit of the doubt and ask if one of the metrics it listed was the same as the one I was expecting but not on the list. The metric had a different name, but its definition was similar to what I expected. So maybe it was just a simple misunderstanding.

DeepSeek answered clearly with “No” and explained the difference between the two metrics. I did not know whether to cry or laugh as I read the answer.

Let’s go over each one.

Defect Detection Percentage (DDP)

DeepSeek included DDP as one of the three defect metrics on its list. DDP helps assess the effectiveness of the testing process by comparing the number of defects found during testing to the total number of defects found during and after release.

In other words, you calculate a percentage by dividing the number of defects found during testing by the total number of defects. Then, you multiply by 100 to get a percentage rate. If the rate is high, for example, 90%, your testing process is better than one with a rate of 70%.

Defect Removal Efficiency (DRE)

DeepSeek did not include DRE on its list. DRE helps assess the effectiveness of the overall defect removal process within your development process by comparing the number of defects found during the development process to the total number of defects found during and after release.

In other words, you calculate a percentage by dividing the number of defects found during development by the total number of defects. Then, you multiply by 100 to get a percentage rate. If the rate is high, for example, 90%, your development process is better than one with a rate of 70%.

DDP vs DRE

Do you notice the difference in the definitions? DDP only considers testing, while DRE considers the defect removal process. DeepSeek says testing only includes unit, integration, and system testing. On the other hand, DRE includes everything in DDP, plus code reviews, design reviews, requirements reviews, etc.

Is it just me, or does this sound stupid? Per DeepSeek, DDP is a subset of DRE. If you measure DRE and find it too low, you will conduct a deeper root cause analysis (RCA). Even if you find a difference between DRE and DDP, it is doubtful that testing is the reason for your problem unless there is a massive difference between the two numbers.

For example, let's say you measure your DRE at 70%. Thanks to DRE, you already have the data for DDP, so you calculate DDP, and the number is 60%. Does that mean you focus immediately on that 10% difference? The answer is NO. Finding a defect during testing is a lot more expensive than finding the defect during design review or even code review. So, if you bring your DDP from 60% to 70%, you improved, but it is not the most effective improvement if your DRE is still at 75%.

There are too many questions whose answers are more valuable than improving your DDP. For example, if one defect was introduced during the requirements process, it is now ten times more costly than one introduced during coding.

In other words, you should only measure DDP when an RCA of your DRE tells you it is the only issue you need to address to improve your quality.

The Top 10 Prompt

I did a little experiment. I asked DeepSeek to give me the top 10 software testing metrics. Due to my previous questions, I suspected that DDP and DRE metrics would appear on the list. Once again, DeepSeek did not disappoint. It even cheated and gave me eleven instead of ten. Here is the list:

Defect Detection Percentage (DDP)
Defect Removal Efficiency (DRE)
Test Coverage
Test Case Effectiveness
Defect Density
Test Execution Rate
Test Case Pass/Fail Rate
MTTD/MTTR (Mean Time To Detect / Mean Time To Repair)
Escaped Defects
Test Automation Coverage

Just for fun, I asked OpenAI’s ChatGPT the same question. Here is the list:

Test Coverage
Defect Density
Defect Leakage
Defect Removal Efficiency (DRE)
Test Execution Rate
Test Pass Rate
Mean Time To Detect (MTTD)
Mean Time To Repair (MTTR)
Test Automation Coverage
Escaped Defects

It is not a surprise that the two lists overlap significantly, as the source is the same. More importantly, they clearly show what the software testing community thinks are the most essential software testing metrics.

Are these lists correct? No, they are not. Should you track all these metrics? No, you should not, but that is a subject for another blog post.

Final Thoughts

I am not trying to criticize DeepSeek, OpenAI, or any other LLMs. They are a window into what people have written about. However, the number of software testing metrics is so large that even LLMs do not know how to filter them and determine what is worth showing. DeepSeek did not include DRE on its long list, but it did on the top 10 list. Is that a hallucination or simply confusion due to the vast array of contradicting information on the internet? Who knows.

Unsurprisingly, most teams do not track most of these metrics. In my experience, they track metrics based on what others tell them to do without considering their context. I blame the software testing community. They are lost at sea and do not know which direction to go.

At Testaify, we know testing is complex and requires making choices. It also requires defining a comprehensive testing strategy and how to implement it. That strategy will determine what you need to measure.

About the Author

Rafael E Santos is Testaify's COO. He's committed to a vision for Testaify: Delivering Continuous Comprehensive Testing through Testaify's AI-first testing platform. Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.

Take the Next Step

Join the waitlist to be among the first to know when you can bring Testaify into your testing process.

A Rant About Software Testing Metrics