Mocking and Stubbing: Building a Controlled Sandbox for Your Unit Tests

You write a unit test, run it, and it fails. You check the code — it looks fine. The problem? Your test depends on a database that's down, an API that's rate-limiting, or a file that's missing. Sound familiar? This is where mocking and stubbing come in. They let you replace real dependencies with controlled stand-ins, so your tests run fast and fail only when your code is actually broken. In this guide, we'll build a mental model for mocks and stubs, walk through concrete examples, and cover the edge cases that trip up most teams.

Why You Need a Test Sandbox

Unit tests are supposed to verify a single unit of behavior in isolation. But real code rarely lives in a vacuum. A typical function might call a database, send an HTTP request, or write to a file. When you test that function without isolating it, you're not just testing your logic — you're testing the network, the database server, and the file system permissions. That's an integration test, not a unit test. And integration tests are slower, flakier, and harder to debug.

Consider a simple function that fetches a user's profile from a database and formats it. If the database is unreachable during testing, the test fails even though your formatting logic is correct. Now you have a false negative — a test that says something is wrong when it's not. Over time, false negatives erode trust in the test suite. Developers start ignoring failing tests, and real bugs slip through.

Mocking and stubbing solve this by creating a controlled sandbox. You replace the real database call with a stub that returns a fixed response, or a mock that verifies how your code interacts with it. The test becomes deterministic: it passes or fails based solely on your code's behavior, not on external factors. This makes your tests faster (no network I/O), more reliable (no flaky services), and easier to write (no complex setup).

That sounds ideal, but there's a catch: using mocks and stubs incorrectly can create tests that pass even when your code is broken. The key is understanding what each technique does and when to apply it. Let's break down the core ideas.

Core Idea: Stubs vs. Mocks

People often use the terms mocking and stubbing interchangeably, but they serve different purposes. A stub is a replacement that returns a fixed value. You use it when you need a dependency to provide data for your code to process. A mock is a replacement that records how it was called — what arguments were passed, how many times, in what order. You use it to verify that your code interacts with the dependency correctly.

Stubs: Providing Data

Imagine you're testing a function that calculates a discount based on a user's membership tier. The function calls a service to get the user's tier. In your test, you can stub that service to always return 'gold'. Now your function runs with known input, and you can check whether the discount is calculated correctly. The stub doesn't care how many times it's called or with what arguments — it just returns the same value every time.

Stubs are simple and great for setting up test scenarios. They're especially useful when you need to simulate error conditions, like a database returning null or an API throwing an exception. But stubs don't tell you whether your code called the service correctly. That's where mocks come in.

Mocks: Verifying Interactions

Now suppose you're testing a function that sends an email after a user signs up. The function calls an email service with the user's address and a message. You want to verify that the email service was called with the right parameters. A mock can record the call and let you assert that the address was correct and the message contained a welcome note.

Mocks are more powerful but also more dangerous. If you overuse them, your tests become brittle — they break when you refactor internal implementation details that shouldn't affect correctness. A good rule of thumb: use stubs for queries (code that fetches data) and mocks for commands (code that performs side effects). Queries return data that you process; commands trigger actions that you need to verify.

How Mocking and Stubbing Work Under the Hood

Most testing frameworks provide built-in tools for mocking and stubbing. In Python, the unittest.mock module lets you replace any object with a mock. In JavaScript, libraries like Jest, Sinon, and Testdouble offer similar capabilities. The mechanism is straightforward: you intercept the call to a real dependency and replace it with a fake object that you control.

Dependency Injection and Monkey Patching

There are two common ways to inject mocks: dependency injection and monkey patching. With dependency injection, you design your code to accept its dependencies as parameters, so in tests you can pass mocks instead of real objects. This is clean and explicit. With monkey patching, you replace the dependency at runtime by overwriting its reference in the module or class. This is less explicit but works with legacy code that wasn't designed for testability.

For example, in Python, you might have a function that calls requests.get. In your test, you can patch requests.get to return a mock response. The patched version is used only for the duration of the test. Most frameworks handle cleanup automatically, so you don't have to worry about side effects leaking between tests.

What Happens When You Call a Mock

When you call a mock object, it records the call — the arguments, the time, and the order. It can also return a value you specify, or raise an exception. After the test, you can assert on the recorded calls. For instance, you might assert that the mock was called exactly once with specific arguments. The mock framework provides methods like assert_called_once_with or toHaveBeenCalledWith to make these checks readable.

Stubs are a simpler version: they don't record calls, they just return a value. In practice, many frameworks blur the line — a mock can act as a stub if you don't verify its calls. But it's helpful to keep the distinction in mind when designing tests.

Worked Example: Testing a User Registration Flow

Let's walk through a concrete example. Suppose we have a user registration function that checks if the username is taken, creates a user in the database, and sends a welcome email. We'll write tests using stubs and mocks.

Setting Up Dependencies

The function depends on two services: a user repository (for database queries) and an email service. We'll use dependency injection to pass these services in. In production, the real implementations connect to a database and an SMTP server. In tests, we'll pass stubs and mocks.

First, we stub the user repository to return 'None' when checking the username (meaning the username is available). Then we stub the create method to return a new user object. For the email service, we use a mock because we want to verify that the welcome email was sent.

Writing the Test

We set up the stubs and mock, call the registration function, and then assert on the results. We check that the returned user has the correct username and that the email mock was called with the user's email address. The test passes only if the function correctly interacts with both dependencies.

If we later refactor the function to send a different email template, the mock assertion will fail, alerting us that the behavior changed. That's good — it's a signal to update the test. But if we had used a stub for the email service, the test would still pass, and we might miss a regression.

What Could Go Wrong

One common mistake is over-mocking: using mocks for everything, including simple data lookups. This makes tests fragile. For example, if you mock the user repository's create method and then change the method signature, the test breaks even if the registration logic is still correct. A better approach is to stub queries and mock commands, as we did above.

Another pitfall is not cleaning up mocks between tests. If a mock persists across tests, it can cause false failures or false passes. Most frameworks reset mocks automatically, but if you're using manual patching, make sure to restore the original objects in a teardown step.

Edge Cases and Exceptions

Mocking and stubbing aren't always straightforward. Here are some edge cases to watch for.

Async Code and Callbacks

If your code uses asynchronous calls or callbacks, you need to handle the timing carefully. For example, if you mock an async function, the mock might return a coroutine that needs to be awaited. Many frameworks provide async mock variants. Similarly, if your code passes a callback to a dependency, you might need to invoke the callback manually in your test to simulate the dependency's behavior.

Mocking Built-in Functions or Global Objects

Monkey-patching built-in functions like open or datetime.now can be tricky. Some languages prevent patching built-ins entirely. In Python, you can patch builtins.open, but you need to be careful about the scope. In JavaScript, you can mock global objects like fetch or localStorage, but you must restore them after the test to avoid polluting other tests.

Partial Mocks and Spies

A partial mock (or spy) wraps a real object and lets you override specific methods while keeping others real. This is useful when you want to test a method that calls another method on the same object. For example, you might spy on a class method to verify it was called, while letting the rest of the class behave normally. However, partial mocks can lead to tests that are tightly coupled to implementation details, so use them sparingly.

Limits of the Approach

Mocking and stubbing are powerful, but they have limits. The most important one is that they don't test integration. A test with mocks can pass even if the real dependencies behave differently. For example, you might stub a database call to return a user object, but in production the database returns a slightly different format or throws a unique constraint error. Your test won't catch that.

To mitigate this, use a mix of unit tests (with mocks) and integration tests (with real dependencies). The unit tests verify your logic in isolation; the integration tests verify that the pieces fit together. Many teams follow the test pyramid: many unit tests, fewer integration tests, and even fewer end-to-end tests.

Another limit is that mocking can make refactoring harder. If you mock internal methods, you're locking in the implementation. When you refactor, you have to update the mock expectations, even if the external behavior hasn't changed. To avoid this, mock only at the boundaries of your system — external services, databases, file systems — not internal collaborators within your own codebase.

Finally, remember that mocks and stubs are tools, not goals. The goal is to have a fast, reliable test suite that gives you confidence. If you find yourself spending more time maintaining mocks than writing production code, it's a sign that your test design needs rethinking. Consider using fakes (lightweight in-memory implementations) for complex dependencies, or write integration tests for critical paths.

In practice, we recommend starting with stubs for most dependencies and adding mocks only when you need to verify side effects. Keep your tests simple and focused on behavior, not implementation. And always run a few integration tests to catch the mismatches that mocks miss.

Mocking and Stubbing: Building a Controlled Sandbox for Your Unit Tests

Table of Contents

Why You Need a Test Sandbox

Core Idea: Stubs vs. Mocks

Stubs: Providing Data

Mocks: Verifying Interactions

How Mocking and Stubbing Work Under the Hood

Dependency Injection and Monkey Patching

What Happens When You Call a Mock

Worked Example: Testing a User Registration Flow

Setting Up Dependencies

Writing the Test

What Could Go Wrong

Edge Cases and Exceptions

Async Code and Callbacks

Mocking Built-in Functions or Global Objects

Partial Mocks and Spies

Limits of the Approach

Comments (0)

Table of Contents

Why You Need a Test Sandbox

Core Idea: Stubs vs. Mocks

Stubs: Providing Data

Mocks: Verifying Interactions

How Mocking and Stubbing Work Under the Hood

Dependency Injection and Monkey Patching

What Happens When You Call a Mock

Worked Example: Testing a User Registration Flow

Setting Up Dependencies

Writing the Test

What Could Go Wrong

Edge Cases and Exceptions

Async Code and Callbacks

Mocking Built-in Functions or Global Objects

Partial Mocks and Spies

Limits of the Approach

Share this article:

Comments (0)

Related Articles

Mocking & Stubbing with Real-World Craft Tools: A Zencraft Guide

Mocking & Stubbing with Real-World Analogies: A Beginner’s Craft Guide

Mocking and Stubbing: Building Test Doubles with Real-World Analogies