Unit Testing: from theory to real case

Discussion:

(too old to reply)

pozz

2024-08-27 10:52:21 UTC

I read a lot about unit testing, but unfortunately I usually work on
single-developer projects with stressing time constraints, so I never
created full tests for an entire project in the past. This means I'm a
newbie in this aspect of software development.

I know the importance of testing, but we have to admit that it increases
the cost of software development a lot, at least at the beginning. Not
always we have the possibility to invest this price.

Everytime I start writing some tests, I eventually think I'm wasting my
precious time. Most probably because I'm not able to create valid tests.

So I'm asking you to help on a real case.

First of all, I have a great confusion in my mind about the subtle
differences about mocks, stubs, fakes, dummies and so on. Anyway I think
these names are not so important, so go on.

These days I'm working on a calendar scheduler module. The client of
this module can configure up to N events that could be:
- single (one shot)
- weekly (for example, on Monday and Saturday of every weeks)
- monthly (for example, the days 3-5-15 of every months)
- yearly (for example, the day 7 of months Jan, Feb and Mar)
Weekly, monthly and yearly events have a starting time and *could* have
a maximum number of repetitions (or they could be forever).

The interface is very simple. I have some functions to initialize the
configuration of an event (a simple C struct):

void calev_config_init_single(CalendarEventConfig *config, time_t
timestamp, CalendarEventActions *actions);
void calev_config_init_weekly(CalendarEventConfig *config, time_t
timestamp, uint8_t weekdays, unsigned int nrep, CalendarEventActions
*actions);
void calev_config_init_monthly(CalendarEventConfig *config, time_t
timestamp, uint32_t mdays, unsigned int nrep, CalendarEventActions
*actions);
void calev_config_init_yearly(CalendarEventConfig *config, time_t
timestamp, uint16_t months, unsigned int nrep, CalendarEventActions
*actions);

I have a function that initializes the module with some pre-programmed
events:

void calendar_init(CalendarEventConfig *list_events, size_t num_events);

I have a function that is called every second that triggers actions on
occurrences:

void calendar_task(void);

So, the client of calendar module usually does the following:

CalendarEventConfig events[4];
calev_config_init_...(&events[0], ...
calev_config_init_...(&events[1], ...
calev_config_init_...(&events[2], ...
calev_config_init_...(&events[3], ...
calendar_init(events, 4);
while(1) {
calendar_task(); // every second
...
}

The calendar module depends on some other modules. First of all, it asks
for the current time as time_t. It calls make_actions() function, with
certain parameters, when an event occurrence expired.

I know how to fake the time, replacing the system time with a fake time.
And I know how to create a mock to check make_actions() calls and
parameters.

Now the problem is... which tests to write?

I started writing some tests, but after completed 30 of them, I'm
thinking my work is not valid.

I was tempted to write tests in this way:

TEST(TestCalendar, OneWeeklyEvent_InfiniteRepetition)
{
CalendarEventConfig cfg;
calev_config_init_weekly(&cfg, parse_time("01/01/2024 10:00:00"),
MONDAY | SATURDAY, 0, &actions);

set_time(parse_time("01/01/2024 00:00:00")); // It's monday
calendar_init(&cfg, 1);

set_time(parse_time("01/01/2024 10:00:00")); // First occurrence
mock().expectOneCall("make_actions")...
calendar_task();

set_time(parse_time("06/01/2024 10:00:00")); // It's saturday
mock().expectOneCall("make_actions")...
calendar_task();

set_time(parse_time("08/01/2024 10:00:00")); // It's monday again
mock().expectOneCall("make_actions")...
calendar_task();

mock().checkExpectations();
}

However it seems there are many sub-tests inside
OneWeeklyEvent_InfiniteRepetition test (the first occurrence, the second
and third).
The tests should have a single assertion and should test a very specific
behaviour. So I split this test in:

TEST(TestCalendar, OneWeeklyEventInfiniteRepetition_FirstOccurrence)
TEST(TestCalendar, OneWeeklyEventInfiniteRepetition_SecondOccurrence)
TEST(TestCalendar, OneWeeklyEventInfiniteRepetition_ThirsOccurrence)
What else? When to stop?

Now for the weekly event with only 5 repetitions.
TEST(TestCalendar, OneWeeklyEvent5Repetitions_FirstOccurrence)
TEST(TestCalendar, OneWeeklyEvent5Repetition_SecondOccurrence)
TEST(TestCalendar, OneWeeklyEvent5Repetition_SixthOccurrence_NoActions)

The combinations and possibilities are very high. calendar_init() can be
called with only 1 event, with 2 events and so on. And the behaviour for
these cases must be tested, because it should behaves well with 1 event,
but not with 4 events.

The events can be passed to calendar_init() in a random (not
cronologically) order. I should test this behaviour too.

There could be one-shot, weekly with infinite repetitions, weekly with a
few repetitions, monthly... yearly, with certain days in common...

calendar_init() can be called when the current time is over the starting
timestamp of all events. In some cases, there could be future
occurrences yet (infinite repetitions) and in others that event can be
completely expired (limited repetitions).

I'm confused. How to scientifically approach this testing problem? How
to avoid the proliferation of tests? Which tests are really important
and how to write them?

Don Y

2024-08-29 20:56:39 UTC

Permalink

Post by pozz
I read a lot about unit testing, but unfortunately I usually work on
single-developer projects with stressing time constraints, so I never created
full tests for an entire project in the past. This means I'm a newbie in this
aspect of software development.

Fix it now or fix it later -- when you have even LESS time (because customers
are using your defective product)

Testing should start when you define the module, continue while you are
implementing it (you will likely notice "conditions" that could lead to
bogus behavior as you are writing them!), and when you consider it "done".

Thinking about testing when you draft the specification helps you
challenge your notion of the suitability of such a module for the task(s)
at hand as you imagine use cases (and MISuse cases).

Post by pozz
I know the importance of testing, but we have to admit that it increases the
cost of software development a lot, at least at the beginning. Not always we
have the possibility to invest this price.

If you assume there are two types of "software" -- stuff that TRIES to work
and stuff that HOPES to work, then the cost of the latter can be a lot less...
because you really don't care *if* it works! Apples; Oranges.

Post by pozz
These days I'm working on a calendar scheduler module. The client of this
- single (one shot)
- weekly (for example, on Monday and Saturday of every weeks)
- monthly (for example, the days 3-5-15 of every months)
- yearly (for example, the day 7 of months Jan, Feb and Mar)
Weekly, monthly and yearly events have a starting time and *could* have a
maximum number of repetitions (or they could be forever).

Testing aims to prove that:
- your specification for the module accurately reflects its need (suitability)
- the module actually implements the specification (compliance)
- the module is well-behaved in "all" possible scenarios, even when misused
- changes to the module haven't compromised past performance (regression)

It also gives you an idea of how your "process" is working; if you are
finding *lots* of bugs, perhaps you should be testing more aggressively
earlier in the process (there is a tendency to NOT want to make lots of
changes/fixes to code that you've convinced yourself is "almost done")

And, it provides exemplars that you can use to evaluate performance.

Post by pozz
The calendar module depends on some other modules. First of all, it asks for
the current time as time_t. It calls make_actions() function, with certain
parameters, when an event occurrence expired.

Treat each as an independent, testable entity. This makes it easier to
design test cases and easier to isolate anomalous behavior(s).

Post by pozz
I'm confused. How to scientifically approach this testing problem? How to avoid
the proliferation of tests? Which tests are really important and how to write
them?

Make a concerted effort thinking of how to *break* it. E.g., If you try to
schedule an event for some time in the past, how should it react? Should it
immediately "trigger" the event? Should it silently dismiss the event?
Should it throw an error?

What if "the past" was just half a second ago and you've been unlucky
enough that your task was delayed a bit so that the clock ticked off
another second before you got a chance to schedule your event AHEAD of
time?

If there are multiple steps to scheduling an event (e.g., creating a structure
and then passing it on to a scheduler), consider if one of the steps might
(intentionally!) be bypassed and how that might inject faulty behavior into
your design. E.g., if you do all of your sanity checks in the "create
structure" step, BYPASSING that step and passing a structure created
by some other means (e.g., const data) avoids that sanity checking; will
the scheduler gag on possibly "insane" data introduced in such a manner?

Can a client become confused as to which structures are "still active"
vs. "already consumed"? If an active structure is altered, can that
lead to an inconsistent state (e.g., if the scheduler has acted on *part*
of the information but still relying on the balance to complete the action)?

Can a client safely repurpose an event specification? Or, once created,
does the scheduler "own" it? Is there some safety window in which such
alterations won't "confuse" the scheduler, outside of which the scheduler
may have already taken some actions on the assumption that the event IS
still scheduled?

What happens if someone changes the current *time*? Do all events that are
now "past" instantly trigger? Are they dismissed? Do they move forward or
backwards in time based on the delta introduced to the current time?

[This is a common flaw in folks trying to optimize such subsystems. There is
usually a need for relative events AND absolute events as an acknowledgement
that "time" changes]

These interactions with the rest of the system (clients) can help you
think about the DESIRED functionality and the actual use patterns. You
may discover your implementation strategy is inherently faulty rendering the
*specification* defective.

pozz

2024-08-30 08:18:22 UTC

Permalink

Post by Don Y

Post by pozz
I read a lot about unit testing, but unfortunately I usually work on
single-developer projects with stressing time constraints, so I never
created full tests for an entire project in the past. This means I'm a
newbie in this aspect of software development.

Fix it now or fix it later -- when you have even LESS time (because customers
are using your defective product)
Testing should start when you define the module, continue while you are
implementing it (you will likely notice "conditions" that could lead to
bogus behavior as you are writing them!), and when you consider it "done".
Thinking about testing when you draft the specification helps you
challenge your notion of the suitability of such a module for the task(s)
at hand as you imagine use cases (and MISuse cases).

Post by pozz
I know the importance of testing, but we have to admit that it
increases the cost of software development a lot, at least at the
beginning. Not always we have the possibility to invest this price.

Post by pozz
These days I'm working on a calendar scheduler module. The client of
- single (one shot)
- weekly (for example, on Monday and Saturday of every weeks)
- monthly (for example, the days 3-5-15 of every months)
- yearly (for example, the day 7 of months Jan, Feb and Mar)
Weekly, monthly and yearly events have a starting time and *could*
have a maximum number of repetitions (or they could be forever).

- your specification for the module accurately reflects its need (suitability)
- the module actually implements the specification (compliance)
- the module is well-behaved in "all" possible scenarios, even when misused
- changes to the module haven't compromised past performance (regression)
It also gives you an idea of how your "process" is working; if you are
finding *lots* of bugs, perhaps you should be testing more aggressively
earlier in the process (there is a tendency to NOT want to make lots of
changes/fixes to code that you've convinced yourself is "almost done")
And, it provides exemplars that you can use to evaluate performance.

Post by pozz
The calendar module depends on some other modules. First of all, it
asks for the current time as time_t. It calls make_actions() function,
with certain parameters, when an event occurrence expired.

Treat each as an independent, testable entity. This makes it easier to
design test cases and easier to isolate anomalous behavior(s).

Post by pozz
I'm confused. How to scientifically approach this testing problem? How
to avoid the proliferation of tests? Which tests are really important
and how to write them?

Make a concerted effort thinking of how to *break* it. E.g., If you try to
schedule an event for some time in the past, how should it react?
Should it
immediately "trigger" the event? Should it silently dismiss the event?
Should it throw an error?
What if "the past" was just half a second ago and you've been unlucky
enough that your task was delayed a bit so that the clock ticked off
another second before you got a chance to schedule your event AHEAD of
time?
If there are multiple steps to scheduling an event (e.g., creating a structure
and then passing it on to a scheduler), consider if one of the steps might
(intentionally!) be bypassed and how that might inject faulty behavior into
your design. E.g., if you do all of your sanity checks in the "create
structure" step, BYPASSING that step and passing a structure created
by some other means (e.g., const data) avoids that sanity checking; will
the scheduler gag on possibly "insane" data introduced in such a manner?
Can a client become confused as to which structures are "still active"
vs. "already consumed"? If an active structure is altered, can that
lead to an inconsistent state (e.g., if the scheduler has acted on *part*
of the information but still relying on the balance to complete the action)?
Can a client safely repurpose an event specification? Or, once created,
does the scheduler "own" it? Is there some safety window in which such
alterations won't "confuse" the scheduler, outside of which the scheduler
may have already taken some actions on the assumption that the event IS
still scheduled?
What happens if someone changes the current *time*? Do all events that are
now "past" instantly trigger? Are they dismissed? Do they move forward or
backwards in time based on the delta introduced to the current time?
[This is a common flaw in folks trying to optimize such subsystems.
There is
usually a need for relative events AND absolute events as an
acknowledgement
that "time" changes]
These interactions with the rest of the system (clients) can help you
think about the DESIRED functionality and the actual use patterns. You
may discover your implementation strategy is inherently faulty rendering the
*specification* defective.

Thank you for your reply, Don. They are valuable words that I read and
hear many times. However I'm in trouble to translate them into real testing.

When you write, test for this, test for that, what happens if the client
uses the module in a wrong way, what happens when the system clock
changes a little or a big, and when the task missed the exact timestamp
of an event?

I was trying to write tests for *all* of those situations, but it seemed
to me a very, VERY, *VERY* big job. The implementation of the calendar
module took me a couple of days, tests seem an infinite job.

I have four types of events, for each test I should check the correct
behaviour for each type.

What happen if the timestamp of an event was already expired when it is
added to the system? I should write 4 tests, one for each type.

AddOneShotEventWithExpiredTimestamp_NoActions
AddWeeklyEventWithExpiredTimestamp_NoActions
AddMonthlyEventWithExpiredTimestamp_NoActions
AddYearlyEventWithExpiredTimestamp_NoActions

What does it mean "expired timestamp"? Suppose the event timestamp is
"01/01/2024 10:00:00". This timestamp could be expired for a few
seconds, a few minutes or one day or months or years. Maybe the module
performs well when the system time has a different date, but bad if the
timestamp expired in the same day, for example "01/01/2024 11:00:00" or
"01/01/2024 10:00:01".
Should I add:

AddOneShotEventWithExpiredTimestamp1s_NoActions
AddOneShotEventWithExpiredTimestamp1m_NoActions
AddOneShotEventWithExpiredTimestamp1h_NoActions
AddOneShotEventWithExpiredTimestamp1d_NoActions
AddWeeklyEventWithExpiredTimestamp1s_NoActions
AddWeeklyEventWithExpiredTimestamp1m_NoActions
AddWeeklyEventWithExpiredTimestamp1h_NoActions
AddWeeklyEventWithExpiredTimestamp1d_NoActions
AddMonthlyEventWithExpiredTimestamp1s_NoActions
AddMonthlyEventWithExpiredTimestamp1m_NoActions
AddMonthlyEventWithExpiredTimestamp1h_NoActions
AddMonthlyEventWithExpiredTimestamp1d_NoActions
AddYearlyEventWithExpiredTimestamp1s_NoActions
AddYearlyEventWithExpiredTimestamp1m_NoActions
AddYearlyEventWithExpiredTimestamp1h_NoActions
AddYearlyEventWithExpiredTimestamp1d_NoActions

They are 16 tests for just a single stupid scenario. If I continue this
way, I will thousands of tests. I don't think this is the way to make
testing, do I?

pozz

2024-08-30 08:37:31 UTC

Permalink

Post by pozz

Post by Don Y

Post by pozz
I read a lot about unit testing, but unfortunately I usually work on
single-developer projects with stressing time constraints, so I never
created full tests for an entire project in the past. This means I'm
a newbie in this aspect of software development.

Fix it now or fix it later -- when you have even LESS time (because customers
are using your defective product)
Testing should start when you define the module, continue while you are
implementing it (you will likely notice "conditions" that could lead to
bogus behavior as you are writing them!), and when you consider it "done".
Thinking about testing when you draft the specification helps you
challenge your notion of the suitability of such a module for the task(s)
at hand as you imagine use cases (and MISuse cases).

Post by pozz
I know the importance of testing, but we have to admit that it
increases the cost of software development a lot, at least at the
beginning. Not always we have the possibility to invest this price.

Post by pozz
The calendar module depends on some other modules. First of all, it
asks for the current time as time_t. It calls make_actions()
function, with certain parameters, when an event occurrence expired.

Treat each as an independent, testable entity. This makes it easier to
design test cases and easier to isolate anomalous behavior(s).

Post by pozz
I'm confused. How to scientifically approach this testing problem?
How to avoid the proliferation of tests? Which tests are really
important and how to write them?

Make a concerted effort thinking of how to *break* it. E.g., If you try to
schedule an event for some time in the past, how should it react?
Should it
immediately "trigger" the event? Should it silently dismiss the event?
Should it throw an error?
What if "the past" was just half a second ago and you've been unlucky
enough that your task was delayed a bit so that the clock ticked off
another second before you got a chance to schedule your event AHEAD of
time?
If there are multiple steps to scheduling an event (e.g., creating a structure
and then passing it on to a scheduler), consider if one of the steps might
(intentionally!) be bypassed and how that might inject faulty behavior into
your design. E.g., if you do all of your sanity checks in the "create
structure" step, BYPASSING that step and passing a structure created
by some other means (e.g., const data) avoids that sanity checking; will
the scheduler gag on possibly "insane" data introduced in such a manner?
Can a client become confused as to which structures are "still active"
vs. "already consumed"? If an active structure is altered, can that
lead to an inconsistent state (e.g., if the scheduler has acted on *part*
of the information but still relying on the balance to complete the action)?
Can a client safely repurpose an event specification? Or, once created,
does the scheduler "own" it? Is there some safety window in which such
alterations won't "confuse" the scheduler, outside of which the scheduler
may have already taken some actions on the assumption that the event IS
still scheduled?
What happens if someone changes the current *time*? Do all events that are
now "past" instantly trigger? Are they dismissed? Do they move forward or
backwards in time based on the delta introduced to the current time?
[This is a common flaw in folks trying to optimize such subsystems.
There is
usually a need for relative events AND absolute events as an
acknowledgement
that "time" changes]
These interactions with the rest of the system (clients) can help you
think about the DESIRED functionality and the actual use patterns. You
may discover your implementation strategy is inherently faulty rendering the
*specification* defective.

That is true if the repeated events (weekly, monthly, yearly) have a
limited number of repetitions *and* the system time is over the last
repetition when they are added to the system.
Otherwise, even if the system time is over the event timestamp (first
occurrence), other repetitions could happen in the future, so they
should be added to the list of active events.

So there isn't just one test for the WeeklyEvent, but there are more:

WeeklyEventWith10Repetions_SystemTimeIsOverAllOfThem_IgnoreEvent
WeeklyEventWith10Repetions_SystemTimeIsOverOnlyTwoOfThem_AddEvent
WeeklyEventWithInfiniteRepetions_SystemTimeIsOverSomeOfThem_AddEvent

This for weekly, but also for monthly and yearly.

And what if the addition of events (the call to calendar_init) is made
with multiple events and not only one? Maybe the module behaves
correctly when just one event is added (expired or not), but behaves bad
when two or more events are added.

Maybe it behaves well when all the events are completely expired, but
bad when one is partially expired and on the totally expired.

The combinations that we could imagine are infinite.

Don Y

2024-08-30 12:21:12 UTC

Permalink

When you write, test for this, test for that, what happens if the client uses
the module in a wrong way, what happens when the system clock changes a little
or a big, and when the task missed the exact timestamp of an event?
I was trying to write tests for *all* of those situations, but it seemed to me
a very, VERY, *VERY* big job. The implementation of the calendar module took me
a couple of days, tests seem an infinite job.

Because there are lots of ways your code can fail. You have to prove
that it doesn't fail in ANY of those ways.

uint multiply(multiplicand, multiplier) {
return(6)
}

works well for the test cases:
2,3
3,2
6,1
1,6
but not so well for:
8,5
17,902
1,1
etc.

I have four types of events, for each test I should check the correct behaviour
for each type.
What happen if the timestamp of an event was already expired when it is added
to the system? I should write 4 tests, one for each type.
AddOneShotEventWithExpiredTimestamp_NoActions
AddWeeklyEventWithExpiredTimestamp_NoActions
AddMonthlyEventWithExpiredTimestamp_NoActions
AddYearlyEventWithExpiredTimestamp_NoActions

Chances are, there is one place in your code that is aware of the fact that
the event is scheduled for a PAST time. So, you only need to create one test.
(actually, two -- one that proves one behavior for time *almost* NOT past and
another for time JUST past)

Your goal (having already implemented the modules) is to exercise each
path through the code.

whatever() {
...
if (x > y) {
// do something
} else {
// do something else
}
...
}

Here, there are only two different paths through the code:
- one for x > y
- one for !(x > y)
So, you need to create test cases that will exercise each path.

To verify your "x > y" test, you would want to pick an x that is
just detectably larger than y. And, another case where x is as
large as possible WITHOUT exceeding y. You can view this as defining
the "edge" between the two routes.

If, for example, you picked x = 5 and x = 3 as your test cases
(where y = 4), then you WOULD exercise both paths. But, if you had
mistakenly coded this as
if (x >= y) {
// do something
} else {
// do something else
}
you wouldn't be able to detect that fault, whereas using x = 5 and x = 4
would cause you to wonder why "do something else" never got executed!

What does it mean "expired timestamp"? Suppose the event timestamp is

A time that is "in the past". If it is time 't', now, what happens if the
client specifies an event to happen at time t-1? Should you *immediately*
activate the event (because NOW it is t > t-1?) Or, should you discard it
because it was SUPPOSED to happen 1 second ago?

What if t-495678? Is there a different type of action you expect if the
time is "a long time ago" vs. "just recently"?

Do events happen at *instants* in time? Or, in CONDITIONS of time?

If they happen at instants, then you have to ensure you can discern one
instant from another.

"01/01/2024 10:00:00". This timestamp could be expired for a few seconds, a few
minutes or one day or months or years. Maybe the module performs well when the
system time has a different date, but bad if the timestamp expired in the same
day, for example "01/01/2024 11:00:00" or "01/01/2024 10:00:01".
AddOneShotEventWithExpiredTimestamp1s_NoActions
AddOneShotEventWithExpiredTimestamp1m_NoActions
AddOneShotEventWithExpiredTimestamp1h_NoActions
AddOneShotEventWithExpiredTimestamp1d_NoActions
AddWeeklyEventWithExpiredTimestamp1s_NoActions
AddWeeklyEventWithExpiredTimestamp1m_NoActions
AddWeeklyEventWithExpiredTimestamp1h_NoActions
AddWeeklyEventWithExpiredTimestamp1d_NoActions
AddMonthlyEventWithExpiredTimestamp1s_NoActions
AddMonthlyEventWithExpiredTimestamp1m_NoActions
AddMonthlyEventWithExpiredTimestamp1h_NoActions
AddMonthlyEventWithExpiredTimestamp1d_NoActions
AddYearlyEventWithExpiredTimestamp1s_NoActions
AddYearlyEventWithExpiredTimestamp1m_NoActions
AddYearlyEventWithExpiredTimestamp1h_NoActions
AddYearlyEventWithExpiredTimestamp1d_NoActions
They are 16 tests for just a single stupid scenario. If I continue this way, I
will thousands of tests. I don't think this is the way to make testing, do I?

You declare what scenario you are testing for as a (commentary) preface
to the test stanza.

If you are testing to ensure "NoActions" is handled correctly, then
you look to see how many ways the "NoActions" criteria can tickle
the code.

If there is only ONE place where "NoActions" alters the flow through
the code, then you only need one test (actually, two as you need
to cover "SomeAction" to show that "NoAction" is different).

In a different test scenaario, you would test that 1s, 1m, 1h, 1d,
etc. are all handled correctly IF EACH OF THOSE PASSED THROUGH YOUR
CODE OVER DIFFERENT PATHWAYS.

And, elsewhere, you might test to see that "repeated" events
operate correctly.

You "prove" that one scenario is handled correctly and then
don't need to reexamine those various tests again in any other
scenario UNLESS THEY ALTER THE PATH THROUGH THE CODE.

Your understanding of how the code would LIKELY be crafted lets
you determine some of these tests before you've written ANY
code. E.g., I suggested "expired events" because I am reasonably
sure that SOMEWHERE your code is looking at "event time" vs. "now"
so you would need to test that comparison.

Your knowledge of how the code is *actually* crafted lets you
refine your test cases to cover specifics of YOUR implementation.

Note that test cases that are applied to version 1 of the code should
yield the same results in version 305, even if the implementation
changes dramatically. Because the FUNCTIONALITY shouldn't be
changing.

So, you can just keep adding test cases to your test suite;
you don't ever need to remove any.

[If a test STOPS working, you have to ask yourself how you have
BROKEN/changed the function of the module]

For example, I could implement a "long" integer multiplication routine
by adding the multiplicand to an accumulator a number of times
dictated by the multiplier. I can create test cases for this
implementation.

Later, I could revise the routine to use a shift-and-add approach.

BUT, THE ORIGINAL TEST CASES SHOULD STILL PASS! I might,
however, have to add some other tests to identify failures in
the shift logic in this new approach (e.g., if I only examined
the rightmost 26 of the bits in the multiplier, then a large
value multiplier would fail in this approach but could pass
in the repeated addition implementation). This would be
evident to me in looking at the code because there would be
a different path through the code when the "last bit" had been
checked.

pozz

2024-08-30 16:00:39 UTC

Permalink

Post by pozz
When you write, test for this, test for that, what happens if the
client uses the module in a wrong way, what happens when the system
clock changes a little or a big, and when the task missed the exact
timestamp of an event?
I was trying to write tests for *all* of those situations, but it
seemed to me a very, VERY, *VERY* big job. The implementation of the
calendar module took me a couple of days, tests seem an infinite job.

Because there are lots of ways your code can fail. You have to prove
that it doesn't fail in ANY of those ways.

So you're confirming it's a very tedious and long job.

uint multiply(multiplicand, multiplier) {
return(6)
}
2,3
3,2
6,1
1,6
8,5
17,902
1,1
etc.

This is a simpler case. Just test for a couple of different result and
you're almost sure it works for normal case (positive integers).

Post by pozz
I have four types of events, for each test I should check the correct
behaviour for each type.
What happen if the timestamp of an event was already expired when it
is added to the system? I should write 4 tests, one for each type.
AddOneShotEventWithExpiredTimestamp_NoActions
AddWeeklyEventWithExpiredTimestamp_NoActions
AddMonthlyEventWithExpiredTimestamp_NoActions
AddYearlyEventWithExpiredTimestamp_NoActions

I read that tests shouldn't be written for the specific implementation,
but should be generic enough to work well even if the implementation
changes.

Your goal (having already implemented the modules) is to exercise each
path through the code.
whatever() {
   ...
   if (x > y) {
      // do something
   } else {
      // do something else
   }
   ...
}
- one for x > y
- one for !(x > y)
So, you need to create test cases that will exercise each path.

Now I really know there are only two paths in the current
implementation, but I'm not sure this will stay the same in the future.
What happens if a developer changes the code in:

if (event_type == SINGLE) {
if (event_timestamp < current_time) {
// Event expired, ignore it
} else {
add_event_to_active_events();
}
} else if (event_type == MONTHLY) {
if (event_timestamp <= current_time) {
...
} else {
...
}
} else ...

The paths could multiply.

To verify your "x > y" test, you would want to pick an x that is
just detectably larger than y. And, another case where x is as
large as possible WITHOUT exceeding y. You can view this as defining
the "edge" between the two routes.

Yes, and the test cases proliferate like ants... sigh :-(

If, for example, you picked x = 5 and x = 3 as your test cases
(where y = 4), then you WOULD exercise both paths. But, if you had
mistakenly coded this as
   if (x >= y) {
      // do something
   } else {
      // do something else
   }
you wouldn't be able to detect that fault, whereas using x = 5 and x = 4
would cause you to wonder why "do something else" never got executed!

Post by pozz
What does it mean "expired timestamp"? Suppose the event timestamp is

A time that is "in the past". If it is time 't', now, what happens if the
client specifies an event to happen at time t-1? Should you *immediately*
activate the event (because NOW it is t > t-1?) Or, should you discard it
because it was SUPPOSED to happen 1 second ago?
What if t-495678? Is there a different type of action you expect if the
time is "a long time ago" vs. "just recently"?
Do events happen at *instants* in time? Or, in CONDITIONS of time?
If they happen at instants, then you have to ensure you can discern one
instant from another.

Yes, yes, yes, please add tests, tests, tests.

My trouble isn't inventing new tests for normal and edge and corner
cases, but limiting them to the real necessity, otherwise the software
development stucks on tests.

Post by pozz
"01/01/2024 10:00:00". This timestamp could be expired for a few
seconds, a few minutes or one day or months or years. Maybe the module
performs well when the system time has a different date, but bad if
the timestamp expired in the same day, for example "01/01/2024
11:00:00" or "01/01/2024 10:00:01".
AddOneShotEventWithExpiredTimestamp1s_NoActions
AddOneShotEventWithExpiredTimestamp1m_NoActions
AddOneShotEventWithExpiredTimestamp1h_NoActions
AddOneShotEventWithExpiredTimestamp1d_NoActions
AddWeeklyEventWithExpiredTimestamp1s_NoActions
AddWeeklyEventWithExpiredTimestamp1m_NoActions
AddWeeklyEventWithExpiredTimestamp1h_NoActions
AddWeeklyEventWithExpiredTimestamp1d_NoActions
AddMonthlyEventWithExpiredTimestamp1s_NoActions
AddMonthlyEventWithExpiredTimestamp1m_NoActions
AddMonthlyEventWithExpiredTimestamp1h_NoActions
AddMonthlyEventWithExpiredTimestamp1d_NoActions
AddYearlyEventWithExpiredTimestamp1s_NoActions
AddYearlyEventWithExpiredTimestamp1m_NoActions
AddYearlyEventWithExpiredTimestamp1h_NoActions
AddYearlyEventWithExpiredTimestamp1d_NoActions
They are 16 tests for just a single stupid scenario. If I continue
this way, I will thousands of tests. I don't think this is the way to
make testing, do I?

Ok, but if you create tests knowing how you will implement
functionalities (execution paths), it's possible they will not be
sufficient when the implementation change at version 305.

So, you can just keep adding test cases to your test suite;
you don't ever need to remove any.
[If a test STOPS working, you have to ask yourself how you have
BROKEN/changed the function of the module]
For example, I could implement a "long" integer multiplication routine
by adding the multiplicand to an accumulator a number of times
dictated by the multiplier. I can create test cases for this
implementation.
Later, I could revise the routine to use a shift-and-add approach.
BUT, THE ORIGINAL TEST CASES SHOULD STILL PASS! I might,
however, have to add some other tests to identify failures in
the shift logic in this new approach (e.g., if I only examined
the rightmost 26 of the bits in the multiplier, then a large
value multiplier would fail in this approach but could pass
in the repeated addition implementation). This would be
evident to me in looking at the code because there would be
a different path through the code when the "last bit" had been
checked.

I agree with you. Indeed the beautiful theory that tests shouldn't take
into account real implementation is imho false.

I know a test case should verify a functionality that is indipendent of
implementation, but the number and type of test cases could change
*depending* on the current implementation.

Just as a limit case, suppose you ahave a function that calculate the
square of a number in the range 0-15, so the result can be returned in a
unsigned char. The return value is undefined if the parameter is greater
than 15.

unsigned char square(unsigned char num);

Before implementing the function I can imagine the following test cases:

assert(square(0) == 0)
assert(square(1) == 1)
assert(square(2) == 4)
assert(square(15) == 225)

Now the developer writes the function this way:

unsigned char square(unsigned char num) {
if (num == 0) return 0;
if (num == 1) return 1;
if (num == 2) return 4;
if (num == 3) return 9;
if (num == 4) return 16;
if (num == 5) return 35;
if (num == 6) return 36;
if (num == 7) return 49;
...
if (num == 15) return 225;
}

My tests pass, but the implementation is wrong. To avoid this I, writing
tests, should add so many test cases that I get a headache.

Don Y

2024-08-30 19:33:05 UTC

Permalink

Post by pozz

Post by pozz
When you write, test for this, test for that, what happens if the client
uses the module in a wrong way, what happens when the system clock changes a
little or a big, and when the task missed the exact timestamp of an event?
I was trying to write tests for *all* of those situations, but it seemed to
me a very, VERY, *VERY* big job. The implementation of the calendar module
took me a couple of days, tests seem an infinite job.

Because there are lots of ways your code can fail. You have to prove
that it doesn't fail in ANY of those ways.

So you're confirming it's a very tedious and long job.

It is "tedious" because you consider it as "overhead" instead of an
integral part of the job.

For a painter, cleaning his brushes is "tedious". But, failing to clean
them means they are ruined from their use! Putting fuel into a vehicle
is "tedious" -- but, failing to do so means the vehicle stops moving.
Cashing a paycheck is tedious; "why can't they pay me in cash???"

As to length/duration, it "feels" long because it is "boring" and not
perceived as a productive aspect of the job.

I spend 40% of my time creating specifications, 20% writing the code
to meet those specifications and the remaining 40% of my time TESTING
my code to prove that it meets those specifications.

If I don't know what I am supposed to write (lack of a specification),
then how do I *know* what to write? Do I just fumble around and hope
something gels from my efforts? And, having written "it", how do I
know that it complies with ALL of the specifications?

I suspect you're just starting to write code on day one without
any plan/map as to where you'll be going...

Post by pozz

I read that tests shouldn't be written for the specific implementation, but
should be generic enough to work well even if the implementation changes.

The tests that you write before you write any code should cover the
operation of the module without regard to its actual implementation
(BECAUSE YOU HAVEN'T WRITTEN ANY CODE YET!)

*As* you settle on a particular implementation, your "under the hood"
examination of the code will reveal issues that could present bugs.
So, you should be ADDING test cases to deliberately tickle those
issues. These are just "select, special cases". To an outside observer,
they don't appear "special" as they ALSO meet the specification of the
module, in general.

But, to the implementor with knowledge of the internals, they highlight
special conditions where the code (if incorrect) could fail. They provide
reassurance that those "special conditions" *in* the implementation have
been handled correctly.

Symbolic execution algorithmically identifies these "edges" in the code
and tests them (but, you likely can't afford that luxury in your build
environment).

Post by pozz

Your goal (having already implemented the modules) is to exercise each
path through the code.
whatever() {
    ...
    if (x > y) {
       // do something
    } else {
       // do something else
    }
    ...
}
- one for x > y
- one for !(x > y)
So, you need to create test cases that will exercise each path.

Now I really know there are only two paths in the current implementation, but
I'm not sure this will stay the same in the future.

Then you *add* MORE test cases to tickle the special cases in THAT
implementation. Of course, the old test cases should *still* pass
so there is no reason to remove those! (this is the essence of regression
testing -- to ensure you haven't "slid backwards" with some new change)

Post by pozz

Note that test cases that are applied to version 1 of the code should
yield the same results in version 305, even if the implementation
changes dramatically. Because the FUNCTIONALITY shouldn't be
changing.

Ok, but if you create tests knowing how you will implement functionalities
(execution paths), it's possible they will not be sufficient when the
implementation change at version 305.

Yes. You've written NEW code so have to add tests to cover any potential
vulnerabilities in your new implementation!

If generating test cases was "inexpensive", you would test every possible
combination of inputs to each function and verify the outputs. But, that
would be a ridiculously large number of tests!

So, you pick *smarter* tests that tickle the aspects of the implementation
that are likely to be incorrectly designed/implemented. This reduces the
number of tests and still leaves you confident in the implementation
because you expect the code to be "well behaved" *between* the special
cases that you've chosen.

Post by pozz
assert(square(0) == 0)
assert(square(1) == 1)
assert(square(2) == 4)
assert(square(15) == 225)
unsigned char square(unsigned char num) {
    if (num == 0) return 0;
    if (num == 1) return 1;
    if (num == 2) return 4;
    if (num == 3) return 9;
    if (num == 4) return 16;
    if (num == 5) return 35;
    if (num == 6) return 36;
    if (num == 7) return 49;
    ...
    if (num == 15) return 225;
}
My tests pass, but the implementation is wrong. To avoid this I, writing tests,
should add so many test cases that I get a headache.

If the SPECIFICATION for the module only defines how it is supposed to behave
over the domain {0, 1, 2, 15}, then your initial set of tests are appropriate.
Note that it can't handle a range outside of [0..255] due to the return type
that *you* have chosen. So, applying a test of 16 or larger will give you
a FAILing result -- and cause you to wonder why the code doesn't work
(ans: because you constrained the range in your choice of return type).

This is how testing can catch mistakes

Stefan Reuther

2024-08-30 16:23:21 UTC

Permalink

Post by pozz
I know the importance of testing, but we have to admit that it increases
the cost of software development a lot, at least at the beginning. Not
always we have the possibility to invest this price.

Not investing at the start means: pay it later. In longer debugging
sessions, or more bugs. Or more effort when you start testing.

That's part of the reason why testing is perceived as a burden, not as a
help: you have this big pile of code and no idea where to start. It has
not been designed for testing, so you sit there: how the fuck am I going
to test this beast, simulate time, simulate file access, etc.?

Post by pozz
First of all, I have a great confusion in my mind about the subtle
differences about mocks, stubs, fakes, dummies and so on. Anyway I think
these names are not so important, so go on.

[...]

Post by pozz
The interface is very simple. I have some functions to initialize the

[...]

Post by pozz
I have a function that is called every second that triggers actions on
void calendar_task(void);

If I were to design this interface from scratch, with testing in mind,
it would look like:

void calendar_task(struct Calender* p, time_t t);

One: the calendar should be an object, not global state. This means I
can create as many calendars as I want for testing. Making this possible
usually immediately improves the flexibility of your code, because you
can of course also create multiple calendars for production. New
feature: product now can support multiple users' calendars!

Two: do not have the function ask the system for the time. Instead, pass
in the time. So, no more need to fake the time as global state.

At least for me, it makes testing feel quite natural: each of the tests
I build is a tiny system setup. Not some Frankenstein monster with
replaced system calls and linker tricks.

(One more change I would most likely make to the interface: have it tell
me the time to next event, so I do not have to wake up every second;
only, when an event is due.)

Post by pozz
Now the problem is... which tests to write?

Depends on your requirements (black-box), and your implementation
(white-box).

A calendar with zero events.

A calendar with one event, per type.

A calendar with multiple events of same or different type.

Some border cases that you happen to find in your implementation (e.g. a
one-shot event that has long passed, or a repeated event with interval
"every 0 seconds").

Post by pozz
The combinations and possibilities are very high. calendar_init() can be
called with only 1 event, with 2 events and so on. And the behaviour for
these cases must be tested, because it should behaves well with 1 event,
but not with 4 events.

Does this really make a difference with your implementation? Isn't it
just a for loop, with the cases "zero" and "one or more"? The insertion
logic probably distinguishes between "one" and "two or more".

Post by pozz
I'm confused. How to scientifically approach this testing problem? How
to avoid the proliferation of tests?

Are you measuring coverage (e.g. lcov)? That should tell you your blind
spots.

Post by pozz
Which tests are really important and how to write them?

The simple every-day tests make sure you do not accidentally break
something simple.

The boundary case tests make sure that bringing your software to its
boundaries does not break it (e.g. "nobody can hack it").

I wouldn't say one is more important than the other. But both help me
sleep better.

Stefan

Dave Nadler

2024-09-01 18:30:26 UTC

Permalink

...I'm a newbie in this aspect of software development.
I know the importance of testing, but we have to admit that it
increases the cost of software development a lot, at least at the
beginning. Not always we have the possibility to invest this price.

That is backwards. Unit testing, done appropriately, REDUCES the time
and cost to market. You do need to create sensible and appropriately
focused tests. You might find this helpful:
https://nadler.com/papers/ESC-111paper_Nadler_corrected.pdf

I read a lot about unit testing, but unfortunately I usually work on
single-developer projects with stressing time constraints, so I never
created full tests for an entire project in the past. This means I'm a
newbie in this aspect of software development.
I know the importance of testing, but we have to admit that it increases
the cost of software development a lot, at least at the beginning. Not
always we have the possibility to invest this price.
Everytime I start writing some tests, I eventually think I'm wasting my
precious time. Most probably because I'm not able to create valid tests.
So I'm asking you to help on a real case.
First of all, I have a great confusion in my mind about the subtle
differences about mocks, stubs, fakes, dummies and so on. Anyway I think
these names are not so important, so go on.
These days I'm working on a calendar scheduler module. The client of
- single (one shot)
- weekly (for example, on Monday and Saturday of every weeks)
- monthly (for example, the days 3-5-15 of every months)
- yearly (for example, the day 7 of months Jan, Feb and Mar)
Weekly, monthly and yearly events have a starting time and *could* have
a maximum number of repetitions (or they could be forever).
The interface is very simple. I have some functions to initialize the
void calev_config_init_single(CalendarEventConfig *config, time_t
timestamp, CalendarEventActions *actions);
void calev_config_init_weekly(CalendarEventConfig *config, time_t
timestamp, uint8_t weekdays, unsigned int nrep, CalendarEventActions
*actions);
void calev_config_init_monthly(CalendarEventConfig *config, time_t
timestamp, uint32_t mdays, unsigned int nrep, CalendarEventActions
*actions);
void calev_config_init_yearly(CalendarEventConfig *config, time_t
timestamp, uint16_t months, unsigned int nrep, CalendarEventActions
*actions);
I have a function that initializes the module with some pre-programmed
void calendar_init(CalendarEventConfig *list_events, size_t num_events);
I have a function that is called every second that triggers actions on
void calendar_task(void);
CalendarEventConfig events[4];
calev_config_init_...(&events[0], ...
calev_config_init_...(&events[1], ...
calev_config_init_...(&events[2], ...
calev_config_init_...(&events[3], ...
calendar_init(events, 4);
while(1) {
    calendar_task(); // every second
    ...
}
The calendar module depends on some other modules. First of all, it asks
for the current time as time_t. It calls make_actions() function, with
certain parameters, when an event occurrence expired.
I know how to fake the time, replacing the system time with a fake time.
And I know how to create a mock to check make_actions() calls and
parameters.
Now the problem is... which tests to write?
I started writing some tests, but after completed 30 of them, I'm
thinking my work is not valid.
TEST(TestCalendar, OneWeeklyEvent_InfiniteRepetition)
{
CalendarEventConfig cfg;
calev_config_init_weekly(&cfg, parse_time("01/01/2024 10:00:00"),
     MONDAY | SATURDAY, 0, &actions);
set_time(parse_time("01/01/2024 00:00:00")); // It's monday
calendar_init(&cfg, 1);
set_time(parse_time("01/01/2024 10:00:00")); // First occurrence
mock().expectOneCall("make_actions")...
calendar_task();
set_time(parse_time("06/01/2024 10:00:00")); // It's saturday
mock().expectOneCall("make_actions")...
calendar_task();
set_time(parse_time("08/01/2024 10:00:00")); // It's monday again
mock().expectOneCall("make_actions")...
calendar_task();
mock().checkExpectations();
}
However it seems there are many sub-tests inside
OneWeeklyEvent_InfiniteRepetition test (the first occurrence, the second
and third).
The tests should have a single assertion and should test a very specific
TEST(TestCalendar, OneWeeklyEventInfiniteRepetition_FirstOccurrence)
TEST(TestCalendar, OneWeeklyEventInfiniteRepetition_SecondOccurrence)
TEST(TestCalendar, OneWeeklyEventInfiniteRepetition_ThirsOccurrence)
What else? When to stop?
Now for the weekly event with only 5 repetitions.
TEST(TestCalendar, OneWeeklyEvent5Repetitions_FirstOccurrence)
TEST(TestCalendar, OneWeeklyEvent5Repetition_SecondOccurrence)
TEST(TestCalendar, OneWeeklyEvent5Repetition_SixthOccurrence_NoActions)
The combinations and possibilities are very high. calendar_init() can be
called with only 1 event, with 2 events and so on. And the behaviour for
these cases must be tested, because it should behaves well with 1 event,
but not with 4 events.
The events can be passed to calendar_init() in a random (not
cronologically) order. I should test this behaviour too.
There could be one-shot, weekly with infinite repetitions, weekly with a
few repetitions, monthly... yearly, with certain days in common...
calendar_init() can be called when the current time is over the starting
timestamp of all events. In some cases, there could be future
occurrences yet (infinite repetitions) and in others that event can be
completely expired (limited repetitions).
I'm confused. How to scientifically approach this testing problem? How
to avoid the proliferation of tests? Which tests are really important
and how to write them?