Jack Baker | Developer

Procedurally Generated Test Cases

May 29, 2025

In a recent project at work, I've been creating a mock for an external computation service. Since real requests to the service are metered, the tests burning up real costs and overall allocation isn't great, as well as clogging up the target service with test cases. The service I own is mostly a pass-through and translation service for the external computations (as well as being a potential indirection layer to replace the external service with the existing in-house solution.)


Dreading the idea of the tedious process of generating many different test cases and managing a bunch of objects, making sure they have real-enough looking data that's not all zeroes, etc, I remembered the idea of procedural generation from roguelike games. Seeded psuedorandom generation is a great tool to create random, yet still deterministic, data. In this way, we can know that specific values will always be the same across test runs, and that value can be determined ahead of time and tested for. The thing that I needed for this was some function that could return "random looking" data for a given input, deterministically. For this case, a hashing algorithm like an MD5 sum was perfect. Simple to compute, and being totally cryptographically secure isn't a requirement. I also outlined a special case for generating consistent GUIDs, which I'll discuss later. The interface needed to handle cases of the form "The expected total distance for simulation 17 is 358km." The required unique values here are the product type ("Simulation"), the identifier ("17"), and the subfield ("Total Distance"). I also added a "maximum value" parameter to have things return "more realistic" results. The product type is required because the value returned for "Simulation 17" and "Summary 17" shouldn't be the same. This is more important for the GUID case.

The way it works is that the function takes in the various parameters, and adds them to the hash sequentially. The resulting hash is then divided by 2^31 - 1, and the result is output as a floating point value between 0.0 and 1.0. It is then multiplied by the maximum value parameter and returned. This returns a consistent value for an arbitrary input, and can be easily reproduced outside of the test environment.

In the specific case of generating a psuedorandom GUID, a property of the MD5 algorithm can be used, that the result is always 16 hexadecimal digits. By converting the result to a string, and printing it twice, this provides 32 characters, and with some string interpolation to add dash characters, a realistic-looking, consistent GUID can be produced. In the case that this was designed for, all external identifiers are GUIDs as opposed to sequential database IDs. This way, we can know to always expect "Simulation 23 is associated with external ID {...}." This is why the "product" value is necessary, as "Simulation 99" and "Fixture 99" shouldn't have the same GUID, which they would, if the only input was their id of 99.

To implement it in a test library, it can be created this way:

func GenerateMockResult(id string) Simulation {
    return Simulation {
        ID: GenerateGuid(id),
        TotalTime: GenerateValue("Simulation", id, "Time", 5000),
        TotalCost: GenerateValue("Simulation", id, "Cost", 10000),
        ResultSummaries: []guid {
            GenerateGuid(id+"summary1"),
            GenerateGuid(id+"summary2"),
            GenerateGuid(id+"summary3"),
        },
    }}

By injecting the mock datasource into the application during test time, these results can "appear" to be generated by the external source without relying on any actual external calls, and they will be consistent from run to run, allowing the test creator to assert on specific values, ensuring that they are being handled correctly. Additionally, special cases can be added to simulate or inject specific faults. In the second line of the sample above, a switch statement could be added for specific ID values. This can simulate, for instance, a result that returned with 0 result summaries attached, or a result with all values returning as null, still allowing for handcrafted specific cases in test.


Incidentally, since we now have an API that mocks out all of the expected return cases from the external service, it is a trivial task (trivial in the math textbook sense that it will still take five days to complete) to create a small shell application that serves these responses via a REST API. That resulting application can be run as a docker container that appears, for all intents and purposes, to actually be the external service, just mounted on localhost for some reason. This will appear to be even more realistic as an external service that you can send actual HTTP requests to, etc, but will not only return almost instantly, but also be able to prepare or inject the same very specific failure cases that you were working with in the local unit tests.