Skip to main content
Mocking and Stubbing

Mocking vs Stubbing: Choosing Your Test Doubles with Real-World Analogies

Imagine you're ordering coffee at a busy shop. The barista asks for your order, makes the drink, and hands it to you. Now imagine a different scenario: you're testing a new coffee machine, so you simulate the barista's actions without actually brewing anything. That's the difference between a real interaction and a test double. In software testing, mocks and stubs are two common types of test doubles, but they serve different purposes. This guide will help you choose the right one. Where This Shows Up in Real Work Every project with external dependencies—databases, APIs, file systems—eventually needs test doubles. The decision between mocking and stubbing appears in code reviews, test design discussions, and when your CI pipeline starts failing for mysterious reasons. A typical scenario: you're writing a test for a service that calls a payment gateway.

Imagine you're ordering coffee at a busy shop. The barista asks for your order, makes the drink, and hands it to you. Now imagine a different scenario: you're testing a new coffee machine, so you simulate the barista's actions without actually brewing anything. That's the difference between a real interaction and a test double. In software testing, mocks and stubs are two common types of test doubles, but they serve different purposes. This guide will help you choose the right one.

Where This Shows Up in Real Work

Every project with external dependencies—databases, APIs, file systems—eventually needs test doubles. The decision between mocking and stubbing appears in code reviews, test design discussions, and when your CI pipeline starts failing for mysterious reasons. A typical scenario: you're writing a test for a service that calls a payment gateway. Do you stub the gateway call to return a fixed response, or do you mock it to verify that the call happened with the right parameters? The answer depends on what you're testing.

Consider an e-commerce checkout flow. The OrderService calls PaymentGateway.charge(). If you want to test that the order is marked as paid when the charge succeeds, you'd stub the gateway to return a successful response. If you want to test that the service correctly passes the total amount and currency to the gateway, you'd mock it and assert on the arguments. Both are valid, but mixing them up leads to tests that are hard to read and maintain.

In a typical project, the confusion starts when someone writes a test that uses a mock for everything. They mock the database, mock the logger, mock the email service—and then assert on every single call. The test becomes a list of expectations that mirror the implementation, so any refactoring breaks it. That's the classic over-mocking anti-pattern. The opposite is under-stubbing: using stubs when you need to verify behavior, leaving gaps in test coverage.

Teams often adopt a default approach without thinking about trade-offs. Some shops say "always mock," others say "never mock." Both extremes miss the nuance. The right choice depends on what you're verifying: state or behavior. Stubs are for state verification—checking the result after a call. Mocks are for behavior verification—checking that certain interactions happened. Understanding this distinction early saves hours of debugging later.

Common Scenarios in Code Reviews

In a code review, you might see a test like this:

const mockDb = { findById: jest.fn().mockReturnValue(user) };
const result = service.getUser(1);
expect(mockDb.findById).toHaveBeenCalledWith(1);
expect(result).toEqual(user);

This test uses a mock for findById and then asserts on both the call and the result. That's fine as long as you care about both. But if you only care about the result, the assertion on the call is noise. Remove it, and you have a stub. The test becomes simpler and less coupled to the implementation.

Foundations Readers Confuse

The core confusion comes from overloaded terminology. In testing literature, a stub provides canned answers to calls made during the test. A mock is an object that records calls and allows you to verify that certain interactions occurred. The key difference: stubs don't fail the test unless they're asked for an unexpected value; mocks fail the test if expected calls don't happen or happen with wrong arguments.

Think of a restaurant kitchen. A stub is like a prep cook who always hands you a pre-made sauce when you ask for it. You don't care if they made it fresh; you just need the sauce. A mock is like a head chef who watches you call out orders and then checks that you said "medium rare" for the steak. If you forgot to say it, the test fails. Both are useful, but they serve different roles.

Another common confusion is between dummies, fakes, and stubs. A dummy is an object that's passed around but never actually used—like an empty placeholder. A fake is a working implementation, but simplified—like an in-memory database instead of a real one. Stubs and mocks are more specific: stubs are for indirect inputs, mocks for indirect outputs. Many developers use the term "mock" for all test doubles, but that oversimplifies the design space.

Why does this matter? Because using the wrong type leads to tests that test the wrong thing. If you stub a method that should have been mocked, you won't catch missing calls. If you mock a method that should have been stubbed, you create brittle tests that break on refactoring. The distinction is not just academic; it affects how you write and maintain tests.

The Three-Second Rule

A practical heuristic: if you can write the test by only asserting on the return value or state change, use a stub. If you need to verify that a side effect happened (a call to an external service, a log message, an event emission), use a mock. The rule works because most tests fit one category. If you find yourself doing both for the same dependency, consider whether the test is too broad.

Patterns That Usually Work

Over time, teams develop patterns that balance simplicity with coverage. Here are three approaches that tend to work well across different codebases.

Pattern 1: Stub for Queries, Mock for Commands

This is the most common recommendation. A query is a call that returns data without side effects—like getUserById. Stub it to return test data. A command is a call that changes state—like deleteUser or sendEmail. Mock it to verify it was called with the right arguments. The pattern works because queries are about state (what did we get?), and commands are about behavior (did we do the right thing?).

Example: in a user registration flow, stubbing emailService.sendWelcomeEmail would be wrong—you want to verify the email was sent. Mocking database.findByEmail would be overkill—you only need it to return a result. Apply the query/command distinction, and your tests become cleaner.

Pattern 2: One Mock per Test

When you do use mocks, limit yourself to one per test. Multiple mocks increase the chance that the test is verifying implementation details rather than behavior. If a test mocks three different services, it's likely testing how they're wired together, not what the system does. That's okay for integration tests, but for unit tests, it's a smell. Stick to one mock, and use stubs for the rest.

This pattern also makes failures easier to diagnose. If a test with one mock fails, you know exactly which interaction went wrong. With three mocks, you have to check each expectation. The discipline of one mock forces you to think about what's truly important for that test.

Pattern 3: Use Fakes for Stable Dependencies

For dependencies that are stable and well-understood—like a database ORM—consider using a fake instead of a stub or mock. A fake is a lightweight implementation that behaves like the real thing but runs in memory. For example, an in-memory database that supports the same queries as your production database. Fakes are more robust than stubs because they enforce real constraints (like unique indexes), and they don't require mock setup for every test. The downside: you have to maintain the fake, which can be work if the real dependency changes often.

Many teams use fakes for their data layer and mocks for external APIs. That's a good compromise. The data layer is usually stable, so a fake pays off. External APIs change more often, so mocks help you isolate from those changes.

Anti-Patterns and Why Teams Revert

Even with good patterns, teams fall into traps. Recognizing these anti-patterns early can save you from rewriting tests later.

Anti-Pattern 1: Mocking Everything

The most common anti-pattern is treating every dependency as a mock. The result is a test that is tightly coupled to the implementation. Every refactoring—even a simple rename—breaks the test. The test becomes a burden rather than a safety net. Teams that start with heavy mocking often revert to stubbing once they realize the maintenance cost. The fix is to ask: "Am I testing behavior or implementation?" If you're asserting on internal calls that aren't part of the contract, you're over-mocking.

Anti-Pattern 2: Stubbing When You Need Verification

The opposite is also common: stubbing a method that has side effects, then forgetting to verify the side effect happened. For example, you stub emailService.send() to return true, but you never check that send() was actually called. The test passes even if the code never sends the email. This leads to false confidence. The fix is to use mocks for commands and make the verification explicit.

Anti-Pattern 3: Shared Mutable State Between Tests

When tests share mocks or stubs that have internal state (like call counts), they become order-dependent. Test A sets up a mock that Test B depends on, and suddenly tests fail only when run in a certain order. This is especially common with singleton mocks or static methods. The fix: create fresh test doubles for every test, and avoid static state.

Teams often revert to integration tests after struggling with these anti-patterns. Integration tests are more reliable because they use real dependencies, but they're slower and harder to set up. The goal is to find the right balance: unit tests with well-chosen doubles for most cases, plus a few integration tests for critical paths.

Maintenance, Drift, or Long-Term Costs

Test doubles have a hidden cost: they can drift from the real implementation. If you stub a method that returns a user object, but the real method later returns a different shape, your tests still pass while production code breaks. This is the problem of test double drift. It happens when you don't run integration tests alongside unit tests, or when the stubbed response is out of sync with reality.

To mitigate drift, use contract tests or shared test fixtures. A contract test verifies that your stub matches the real API's behavior. For example, if you stub a payment gateway, write a test that calls the real sandbox endpoint and compares the response shape to your stub. Another approach is to generate stubs from the real API's schema (like OpenAPI specs) so they stay in sync.

Another long-term cost is test brittleness. As the codebase evolves, tests with many mocks become fragile. Each mock creates an implicit dependency on the implementation. When a developer changes how a service works, they have to update all the mocks in the tests. This friction leads to tests being disabled or ignored. The solution is to favor stubs over mocks where possible, and to limit the number of mocks per test.

Teams that invest in a test double strategy early—choosing the right type for each dependency—spend less time on test maintenance later. It's worth discussing this in design reviews, not just in test reviews. When you introduce a new dependency, ask: "Will we mock, stub, or fake this?" The answer guides how you write the code and the tests.

When Not to Use This Approach

There are cases where test doubles are the wrong tool. The most obvious: when the dependency is simple and fast, use the real thing. For example, a pure function that parses a string doesn't need a double. Call it directly. Similarly, if you're testing a database query, use a test database, not a mock. Mocks for database calls often miss SQL syntax errors or constraint violations.

Another case: when the test double becomes more complex than the real dependency. If you find yourself writing a 50-line mock setup for a 10-line function, you're overcomplicating it. Consider using a fake or an integration test instead. The complexity of the test double should reflect the complexity of the dependency. Simple dependencies get simple doubles.

Also avoid mocks for legacy code that you're not refactoring. If you're adding tests to existing code that wasn't designed for testability, mocks can be painful. The code might have deep call chains or singletons that are hard to mock. In that case, start with characterization tests that capture current behavior, then refactor to make the code more testable. Don't force mocks into an untestable design.

Finally, don't use mocks for testing the test double itself. If you're writing a custom stub or fake, test it with real calls to the real dependency, not with mocks. Otherwise, you create a circular dependency that proves nothing.

Open Questions / FAQ

Q: Should I always prefer stubs over mocks?
A: Not always. Stubs are simpler, but they don't verify behavior. If you need to ensure a side effect happened, use a mock. The key is to use each for its intended purpose. A good rule: stub for state, mock for behavior.

Q: How do I decide between a stub and a fake?
A: Use a stub when you only need a single response. Use a fake when you need multiple interactions or behavior that resembles the real dependency (like a database that supports queries and transactions). Fakes are more effort but more realistic.

Q: What about spies?
A: Spies are like mocks but without pre-programmed behavior. They wrap a real object and record calls. They're useful when you want to use the real implementation but still verify interactions. Spies are a middle ground: they give you behavior verification without replacing the implementation.

Q: Can I use both mocks and stubs for the same dependency?
A: Yes, but consider splitting the test. If you need both, the test might be doing too much. Alternatively, use a mock for the interaction and a stub for the return value—that's common for methods that both return data and have side effects.

Q: How do I handle dependencies that are hard to mock?
A: Look for an abstraction layer. If you're trying to mock a static method or a concrete class, wrap it behind an interface or a function. That makes it easy to substitute in tests. If you can't change the code, consider using a test framework that supports mocking static methods (like PowerMock in Java), but be aware of the trade-offs in complexity.

Summary + Next Experiments

Choosing between mocks and stubs comes down to what you're testing: state or behavior. Stubs provide inputs; mocks verify outputs. Use the query/command distinction as a starting point. Limit mocks to one per test, and consider fakes for stable dependencies. Avoid over-mocking and stubbing without verification. Watch out for test double drift by running integration tests alongside unit tests.

Your next steps: review your current test suite and categorize each test double as a stub, mock, or fake. Look for tests with multiple mocks—those are candidates for simplification. Try rewriting one test to use a stub instead of a mock, and see if the test still catches the same bugs. Finally, discuss the mock vs stub decision with your team in your next code review. A shared vocabulary makes tests more consistent and easier to maintain.

Share this article:

Comments (0)

No comments yet. Be the first to comment!