Being Wrong in the Same Direction

Writing a failing test is meant to be the easy part of red/green. You write the test, you watch it fail, you make it pass, and the test certifies the fix. That story is complete, where the test is right and the code is wrong. But it's incomplete in the case where the test and the code are wrong in the same direction, and red/green is most useful precisely there - because nothing else will catch that problem.

Someone has to care.

What follows is a working example of red/green doing the thing that gets ignored: forcing you to commit, in executable form, to a specification precise enough that you can tell when the specification itself is the bug. The function in question is a sigmoid. Or rather, it says it's a sigmoid. We'll get to that.

Process

To do red/green testing, one creates a "red test" - called "red" because the error messages tend to be "red," red is the cultural icon for "stop," and so forth. The "red test" represents a failure. Let's walk through an example issue and then we'll look at the red/green fix for it.

Suppose the issue is with a sigmoid function (a function whose greatest variances are in the middle of the value set, very common in AI applications.)

The issue in question might look like this:

Given an input of 3.0,
When sigmoid() is called,
Then the return value is 1.25.

The specification is that the output range is 0.0 to 1.0. Thus:

Given an input of 3.0,
When sigmoid() is called,
Then the caller should receive an IllegalArgumentException.

The specific exception is justified because a sigmoid producing an out-of-range
output signifies an invalid rule. This does not require checked exception
handling (there should be no recovery for a broken system) and prevents
corrupted outputs from being consumed downstream.

Seems simple enough! ... it's also incorrect.

So the approach here is first to validate the issue is right, where "right" means that it actually summarizes the situation properly:

@Test
void callSigmoidWithBadValue() {
   assertThrows(IllegalArgumentException.class, () -> AIMath.sigmoid(3.0));
}

Then we run the test. We don't really care about the output - the issue says that the method should be throwing an exception, and ... oh, wait, you don't know what the method looks like! Let's see what it actually is:

/**
 * There are dragons here, readers. A maze of twisty passages, all alike,
 * and a grue is going to get us all because we're not paying attention.
 */
double sigmoid(double x) {
   return 0.5 + 0.25 * x;
}

It doesn't throw an exception. Ever. So we know this test will fail. This is good; we have now validated that the issue exists, in a very simple approach. It's not a great approach, but it's a start. What we'd actually like to do is throw a series of boundary conditions at the test. Let's write a parameterized test for JUnit, one that provides an input, whether we expect an exception, and if not, the expected output within a certain granularity.

Why granularity? Because of IEEE math. It's imprecise, and the exact precision isn't necessary for this scope; if we need it, BigDecimal is there, but that's out of scope here.

private static final double TOLERANCE = 1e-9;

@ParameterizedTest(name = "sigmoid({0}): expectException={1}, expected={2}")
@CsvSource({
    "0.0,    false,  0.5",
    "1.0,    false,  0.75",
    "-1.0,   false,  0.25",
    "2.0,    false,  1.0",
    "-2.0,   false,  0.0",
    "3.0,    true,   0.0",
    "-3.0,   true,   0.0",
    "10.0,   true,   0.0",
    "-10.0,  true,   0.0",
    "NaN,    true,   0.0"
})
void callSigmoid(double input, boolean expectException, double expectedOutput) {
    if (expectException) {
        assertThrows(IllegalArgumentException.class,
            () -> AIMath.sigmoid(input));
        return;
    }
    double actual = AIMath.sigmoid(input);
    assertEquals(expectedOutput, actual, TOLERANCE,
        "sigmoid(" + input + ") returned the wrong value");
}

Here, we have ten inputs to throw at our little method. If we run this, we see our first five inputs work - no exceptions - but our last five.. don't. We have confirmed that the issue is correct. It's not useful, per se, but it's correct. We can make it more useful by adding some error logging; let's add SLF4J and track the output from sigmoid().

We're going to do this very clumsily; we're using info() when we should be using debug() or, really, trace(). I'm mostly trying to avoid having to add a logging configuration here.

var sigmoid = 0.5 + 0.25 * x;  
logger.info("sigmoid({}) = {}", x, sigmoid);  
return sigmoid;

Our tests still fail, BUT we see the test run like this now:

08:55:51.379 [Test worker] INFO news.bytecode.AIMath -- sigmoid(3.0) = 1.25

Expected java.lang.IllegalArgumentException to be thrown, but nothing was thrown.
org.opentest4j.AssertionFailedError: Expected java.lang.IllegalArgumentException to be thrown, but nothing was thrown.

When we fix the logging levels to trace() - should we decide to do that - that logger call turns into almost a no-op; it's not quite a no-op, but it's going to evaluate the logging level and stop, without building a string from the inputs.

The natural next move - and the one the issue practically begs us to make - is to trap the bad inputs. Make sigmoid() reject anything that would produce an out-of-range output, exactly as the Given/When/Then block asked. That looks something like this:

double sigmoid(double x) {
    if (Double.isNaN(x) || x < -2.0 || x > 2.0) {
        throw new IllegalArgumentException("input out of range: " + x);
    }
    var sigmoid = 0.5 + 0.25 * x;
    logger.info("sigmoid({}) = {}", x, sigmoid);
    return sigmoid;
}

Now, when we run the tests, they "go green." We've done what the issue asked, the build is happy. Any time anyone alters that function to fail to comply with our issue, the build will break visibly, and early, and we know we have locked down the spec, and we can close the ticket.

Except we shouldn't.

Look at that log line again: sigmoid(3.0) = 1.25. The function is doing exactly what its code says. 0.5 + 0.25 * 3.0 is, in fact, 1.25. There is no arithmetic bug. The function isn't broken - it's misnamed. The implementation is a linear function called sigmoid, and a linear function is not a sigmoid, no matter how confidently the method signature insists otherwise.

A sigmoid is bounded for any real input. That's the entire point: take any real number, map it into the open interval (0, 1). The function we have is bounded only inside [-2, 2]; everywhere else it just keeps climbing or falling. The issue's proposed fix doesn't make our function a sigmoid. It hides the fact that it isn't one, by rejecting what the specification calls "bad inputs."

This is the moment red/green pays off. Writing the test forced us to articulate what we believed about sigmoid() in executable form. The CSV is a public, machine-checkable declaration that "sigmoid(3.0) throws IllegalArgumentException." Said out loud, with the function's name attached, that claim is .. not quite valid, let's say. Maybe "absurd."

Sigmoids don't reject inputs. The test isn't wrong because the implementation doesn't match it: the test is wrong because it agrees with an issue that is wrong about what a sigmoid is.

So the real fix isn't input validation. It's replacing the function body with the actual definition of a sigmoid, and rewriting the test to encode what a sigmoid actually does.

That means applying a little bit of elbow grease and looking up sigmoid functions. If we go to an AI like Ollama with qwen3.6 and ask it something silly, like, oh, What’s the definition of a sigmoid, and what are some sample sigmoid inputs and outputs? - it gives us a pretty detailed answer in a few seconds, including this fascinating data:

**Key Properties:**

- **Domain:** (βˆ’βˆž,∞)
- **Range:** (0,1)
- **Symmetry:** Point-symmetric about (0,0.5)
- **Derivative:** Οƒβ€²(x)=Οƒ(x)(1βˆ’Οƒ(x))
- **Asymptotes:** ...

---

### Sample Inputs & Outputs

|Input xxx|Output (approx.)|
|---|---|
|βˆ’10|0.000045|
|βˆ’5|0.0067|
|βˆ’2|0.1192|
|βˆ’1|0.2689|
|0|0.5000|
|1|0.7311|
|2|0.8808|
|5|0.9933|
|10|0.999955|

Our issue says we're bounding our inputs to keep the outputs in range, when a sigmoid takes any input and bounds the output to (0,1). The bounding in the issue is already the function's job by definition. Asked to write the function in Java, the model goes above-board and gives us this:

public static double sigmoid(double x) {  
    // Numerically stable implementation to avoid overflow for large |x|  
    if (x >= 0) {  
        return 1.0 / (1.0 + Math.exp(-x));  
    } else {  
        double expX = Math.exp(x);  
        return expX / (1.0 + expX);  
    }  
}

And now, we have a lot more information to work with: we have one green test (the test for 0.0) and everything else fails - we get no exceptions, and even when we didn't want exceptions, our results are incorrect.

That misnaming is... actually understandable, defensible, in a lot of contexts. 0.5 + 0.25 * x is exactly the first-order Taylor expansion1 of Οƒ(x) = 1/(1+e^-x) around zero: Οƒ(0) = 0.5, Οƒ'(0) = 0.25.

Whoever wrote the function did some real math. They picked the right anchor, evaluated the right derivative, and produced a linear approximation that's genuinely close to a sigmoid in a small neighborhood of zero. Then, presumably, they tested it against inputs inside the range they cared about, saw outputs that looked right, and shipped it. The implementation isn't ignorant; it's gamed - tuned just enough to pass the checks the author happened to run, named after the thing they meant rather than the thing they wrote.

Our issue is saying the right things in the wrong way, and our red/green tests gave us a path to find the actual issue.

The value of red/green is that red tests (failing tests) are not bad things unless they're unexpected. Here, we've found that we've been zeroing in on an exception condition downstream of the problem - outputs that didn't match what we expected - and we were trying to fix that problem, but we've found out that our base computation was wrong.

Now we can do actual sigmoid calculation - Qwen provided a set of inputs and expected outputs, as it happens - and fix our tests. We no longer have exceptions - we have a NaN condition because why not, and if we want to we can make that an exception, but that's actually a valid output for the sigmoid and is an invalid input for consuming functions.

Our test might look like this:

// Note the reduced tolerance here, just for completeness' sake
private static final double TOLERANCE = 1e-5;  
  
@ParameterizedTest(name = "sigmoid({0}): expected={1}")  
@CsvSource({  
  "-10.0, 0.000045",  
  "-5.0,  0.0067",  
  "-2.0,  0.1192",  
  "-1.0,  0.26894",  
  "0.0,   0.5",  
  "1.0,   0.73105",  
  "2.0,   0.8808",  
  "5.0,   0.9933",  
  "10.0,  0.99995",  
  "NaN,   NaN"  
})  
void callSigmoid(double input, double expectedOutput) {  
  double actual = AIMath.sigmoid(input);  
  assertEquals(  
      expectedOutput, actual, TOLERANCE,  
          "sigmoid(" + input + ") returned the wrong value");  
}

And now we have an actual sigmoid function.

That leaves the issue itself, which is still describing the wrong problem. We could correct it to address the sigmoid function - that is the core of the report, after all - but what we've found goes further than that. Something calling the sigmoid was wrong too. The tests need fixing, the issue needs to catch up to what we now know, and the caller needs the same red/green treatment we just gave the sigmoid.

Manual testing would have caught the bad output eventually, probably. But red/green codifies the specification - when we've codified a Taylor series but called it a sigmoid, the test is where that gap becomes visible. And it works precisely because it's psychologically hard to write down something you know to be incorrect, even when the objection is just a nagging voice at the back of your mind.


  1. I had to look up what this function actually was. This is a contrived example, but if I'm being honest - and I am - I've done exactly this sort of thing in real code, trying to fit a function into a set of outputs with glue, spit, some tape, rusty wire, and a lot of hope. I am not a mathematician. At all.

Comments (0)

Sign in to comment

No comments yet.