QA, reproducibility, TDD

Software Engineering

(for Intelligent Distributed Systems)

A.Y. 2024/2025

Giovanni Ciatto (reusing material made by Danilo Pianini)

Compiled on: 2025-06-30 — printable version

back

Quality assurance

Would you drive a car that has not been succefully passed its quality control?

As any engineered product, software should be subject to quality assurance control.

Quality assurance (in SE) is the set of activities and practices aimed at ensuring that a software product works and it is of good quality.

what does “works” mean for software?
what does “good quality” mean for software?

Quality assurance: “works”

What does “works” mean for software?

Insight: software works when it meets the requirements

Recall that software requirements should come with clear acceptance criteria
- testing as the activity of verifying that the software meets the acceptance criteria

Quality assurance: “good quality”

What does “good quality” mean for software?

Insight: software is good when it is

easy for developers to evolve or maintain it

Recall that good software should have many quality attributes
- reproducible $\approx$ repeatable, with predictable outcomes
- sustainable $\approx$ it’s possible to timely satisfy requirements with controllable costs and efforts
- evolvable $\approx$ it’s possible to adapt the product to new requirements in a sustainable way
- maintainable $\approx$ it’s possible to fix, improve, or just keep the product alive in a sustainable way
- scalable $\approx$ it’s possible to grow the product in terms of size, complexity, and features in a sustainable way
How to translate these attributes into quality assurance practices?
- as we will see, testing may also serve this purpose

Testing: criteria

Verify that the software meets quality criteria.

Functional criteria:
- Does what we expect it to do?
  - Does the software produce the expected results?
Non-functional criteria:
- Does it do it how we want it?
  - Is it secure?
  - Are performance acceptable?

Automated vs. manual

Running an application manually is a form of testing: exploratory testing.

Done without a plan

If there is a plan that can be followed step-by-step, then there is a program that can do it for you

If a program can do it for you, the it should do it for you

Testing scope (pt. 1)

As any engineering product, software can be tested at different levels of abstraction

Unit testing: test single software components
- Is this class (or function or module) behavior the expected one?
- For a car: is the tire working correctly?
  - e.g. are shape, pression, etc. as expected?
Integration testing: test an entire subsystem, i.e. the interplay among multiple components
- Class A uses class B and C. Are they working together as expected?
- For a car: if we attach the wheels to the engine via the transmission, does it work as expected?
  - e.g. we turn on the engine, does the wheel spin?
End-to-end (or acceptance) testing: test an entire system (may involve aesthetics/usability criteria)
- Is this whole application functional, when used from the UI?
  - implies that all components are correctly integrated
- For a car: is it usable by a person to drive in the real world?
  - e.g. we turn on the engine, does the car move?
  - e.g. can the user change direction via the steering wheel?
  - e.g. is the speed indicator reactive to the actual spees? is the unit of measure what the user expects?

Testing scope (pt. 2)

A well-maintained engineering product must have tests at all granularity levels

But why?
- after all, if the end-to-end test passes…
- … then all the unit and integration tests should pass as well, right?

Yes, but:
- tests are not only about verifying that the software works
- they are particularly useful to understand why it doesn’t work

Automated tests as sentinels

Creating automated test procedures makes the activity of testing very cheap (in terms of effort)
- this allows developers to test the software often and early
Being cheap, automated tests can serve as canaries in cold mines
- i.e. sentinels for the (early) detection of problems
Test failures are precious during development
- they help in localising the source of the problem

The more granular the tests, the easier it is to spot and fix problems

Reproducibility

Would you be comfortable with a car that passes the crash test 99.9% of time, but on the 0.1% of the cases fails unexplicably?

Reproducibility is central for testing

(true for any engineering, but in particular for software)

Tests should always provide the same results when run on the same system
- tests that “work sometimes but sometimes not” are called flaky tests
- of course, running the same test procude on different systems may produce different results
  - as well as different versions of the same system
Tests should be self-contained (they should not depend on the results of previous tests)
Testing procedures should be deterministic ($\approx$ no randomness)
- unpredicable events / scenarios (e.g. user inputs, lack of Internet connection) should be simulated
  - one cannot predict when events will occur, but one must predict what sorts of events / scenarios may occur

Technicalities of writing tests

(we will focus on Python, but the concepts are general)

the source code can now be conceinved as composed by two parts:
- the main code: where the actual software is implemented
- the test code: where the tests for the actual software are implemented
the test code is usually placed in a separate folder, and it is usually named tests/ (or test/)
the dependencies of the project are now of two sorts:
- the main dependencies: the libraries required by the main code
- the development (“dev”) dependencies: the libraries required by the quality assurance procedures
  - there exist several libraries which support testing, e.g. unittest (included in Python), or pytest (third-party)
developers may now want to launch not only the software, but also the tests
- ad-hoc terminal commands or IDE plugins are available for this purpose

Updated project structure

root_directory/
├── main_package/               # main package (i.e. directory for the main code)
│   ├── __init__.py
│   ├── sub_module.py
│   └── sub_package/ 
│       ├── __init__.py 
│       └── sub_sub_module.py 
├── tests/                      # directory for the test code
│   ├── test_module_1.py
│   ├── ...
│   └── test_module_N.py 
├── .python-version
├── README.md
├── requirements-dev.txt        # file to list *development* dependencies
└── requirements.txt            # file to list *main* dependencies

Important conventions:

all the test code should be placed in a directory named tests/ (or test/)
the test code should be put into .py files whose name starts with test_
requirements.txt is for the main dependencies, requirements-dev.txt is for the dev dependencies

requirements.txt example:

Kivy>=2.3.0

requirements-dev.txt example:

-r requirements.txt
pytest>=8.1.0

Nomenclature about testing

System under test (SUT): the component of the software that is being tested
- e.g. a class, a function, a module
Test case: a class that contains the test functions for a specific SUT
- each test case corresponds to one or more testing procedures for the same SUT
- in case of multiple procedures, all must share the same set up or tear down activities
  - i.e. activities to before or after each testing procedure from the same test case
Test suite: a collection of test cases, commonly related to similar SUTs
- it commonly consists of a module, e.g. a test_*.py file
Assertion: a boolean (i.e. either True or False) check about the SUT
- if the assertion is True, the assertion passes, and the test proceeds
- if the assertion is False, the test fails, and it is interrupted
Test procedure: a sequence of actions and assertions about some SUT
- it succeeds if all the assertions are True and no unexpected error occurs
- it fails otherwise

Writing tests in Python

We adopt unittest, a built-in library for writing tests in Python

it is inspired by the JUnit library for Java
it is not the only one: pytest is a popular alternative (but it needs to be installed)

Anatomy of a test suite in `unittest`

Let’s assume this is the test_my_system.py test suite (full code here)

⬇️

import unittest


# first test case
class TestMySystemUnderOrdinaryConditions(unittest.TestCase):

    # initialization activities (most commonly, just initialises the SUT)
    def setUp(self):    
        # activities to be performed BEFORE EACH test procedure
        self.sut = MySystem() # sut instantiation

    # test procedure 1
    def test_initial_condition(self):
        self.assertEquals(self.sut.my_attribute, 123) # assertion (my_attribute is initially 123)
        self.assertEquals(self.sut.other_attribute, "foo") # assertion (other_attribute is initially "foo")
        self.assertTrue(self.sut.is_ready()) # assertion (function is_ready returns True)

    # test procedure 2
    def test_do_something(self):
        self.sut.do_something() # legitimate action
        self.assertEquals(self.sut.my_attribute, 124) # assertion (my_attribute is 124 after do_something)
        self.assertEquals(self.sut.other_attribute, "bar") # assertion (other_attribute is "bar" after do_something)
        self.assertFalse(self.sut.is_ready()) # assertion (function is_ready returns False after do_something)

    # test procedure 3
    def test_do_something_bad(self):
        with self.assertRaises(ValueError): # assertion (do_something_base raises ValueError)
            self.sut.do_something_bad() # illegitimate action

    # you can put as many test procedures as you want

    # cleaning up activities (most commonly omitted, i.e. nothing to do)
    def tearDown(self):
      # activities to be performed AFTER EACH test procedure
      self.sut.shutdown() # legitimate action


# second test case
class TestMySystemUnderSpecialConditions(unittest.TestCase):
    # put other test proceedures here


# you can put as many test cases as you want

Technicalities of `unittest` tests suites

Many assertion functions, cf.: https://docs.python.org/3/library/unittest.html#assert-methods
Many options to customise/parametrise your test suites, cf. https://docs.python.org/3/library/unittest.html
How to run tests:
- from the terminal: python -m unittest discover -v -s tests
  - where -v stands for verbose (i.e. more detailed output)
  - where -s stands for start directory (i.e. the directory where the tests are, in this case tests)
- from an IDE: usually there is a dedicated button
- from VS Code: there is a dedicated section which requires configuration
Effect of running all tests with subcommand discover:
- all the test_*.py files in the tests/ directory (and its sub-directories) are loaded
  - all sub-classes of unittest.TestCase from those files are instantiated
    - all the functions from those classes that start with test_ are executed
      1. the setUp function is executed before each test function
      2. the tearDown function is executed after each test function

Hands-on (pt. 1)

Playing a bit with `unittest`

Restoring dev dependencies

Fork the following repository: https://github.com/unibo-dtm-se/testable-calculator
Clone the forked repository on your machine
- `git clone https://github.com/YOUR_GITHUB_USERNAME/testable-calculator
Open VS Code into the testable-calculator directory
- let’s use VS Code’s integrated terminal from now on
Restore both dependencies and dev-dependencies
- pip install -r requirements-dev.txt

Hands-on (pt. 2)

Playing a bit with `unittest`

Running tests via the terminal

Run the tests via the terminal

Minimalistic: python -m unittest discover -s tests
```
..............
---------------------------------------------------------------------- 
Ran 14 tests in 0.478s

OK
```
(each dot represents a successful test procedure… not really clear, right?)

Verbose: python -m unittest discover -v -s tests (notice option -v)

test_cli_with_invalid_expression (test_cli.TestCalculatorCli.test_cli_with_invalid_expression) ... ok
test_cli_with_single_expression (test_cli.TestCalculatorCli.test_cli_with_single_expression) ... ok
test_cli_with_sliced_expression (test_cli.TestCalculatorCli.test_cli_with_sliced_expression) ... ok
[...]
test_expression_insertion (test_model.TestCalculatorUsage.test_expression_insertion) ... ok

----------------------------------------------------------------------
Ran 14 tests in 0.447s

OK

(one test per line: clearer)

Hands-on (pt. 3)

Playing a bit with `unittest`

Running tests via VS Code

Before:

VS Code before

After:

VS Code after

(if you cannot find the Test section, look at the next slide)

What if you cannot find the Test section? (pt. 1)

Missing Test section in VS Code

You probably have and old version of VS Code, and you should update it

⬇️ Meanwhile, you can follow this workaround ⬇️

What if you cannot find the Test section? (pt. 2)

Go to the Extensions section of VS Code
- you can do this by clicking on the Extensions icon in the Activity Bar on the side of the window
In the search bar of the Extensions section, type python tests
- the first result should be the Python extension by Microsoft
- the second result should be the Python Test Explorer extension by Little Fox Team

What if you cannot find the Test section? (pt. 3)

Click on the Install button of both extensions
- while installing, VS Code may look like this

What if you cannot find the Test section? (pt. 4)

Once the installation is complete, you should see the Test section in the Activity Bar on the side of the window

Hands-on (pt. 4)

Playing a bit with `unittest`

Inspecting a real unit test

Have a look to the tests/test_model.py file and listen to the teacher explanation
- it contains a test suite for the Calculator class

⬇️

import unittest
from calculator import Calculator


# test case testing what the effect of each method of the Calculator class is
# when executed on a fresh new Calculator instance
class TestCalculatorMethods(unittest.TestCase):
    def setUp(self):
        # here we create one "virgin" instance of the Calculator class (our SUT)
        self.calculator = Calculator()

    def test_initial_expression_is_empty(self):
        # here we ensure the expression of a virgin Calculator is empty 
        self.assertEqual("", self.calculator.expression)

    def test_digit(self):
        # here we ensure that the digit method effectively appends one digit to the Calculator expression
        self.calculator.digit(1)
        self.assertEqual("1", self.calculator.expression)

    def test_plus(self):
        # here we ensure that the plus method effectively appends one "+" symbol to the Calculator expression
        self.calculator.plus()
        self.assertEqual("+", self.calculator.expression)

    def test_minus(self):
        # here we ensure that the minus method effectively appends one "-" symbol to the Calculator expression
        self.calculator.minus()
        self.assertEqual("-", self.calculator.expression)
    
    def test_multiply(self):
        # here we ensure that the multiply method effectively appends one "*" symbol to the Calculator expression
        self.calculator.multiply()
        self.assertEqual("*", self.calculator.expression)
    
    def test_divide(self):
        # here we ensure that the divide method effectively appends one "/" symbol to the Calculator expression
        self.calculator.divide()
        self.assertEqual("/", self.calculator.expression)


# test case testing the usage of the Calculator class
class TestCalculatorUsage(unittest.TestCase):
    def setUp(self):
        # here we create one "virgin" instance of the Calculator class (our SUT)
        self.calculator = Calculator()

    def test_expression_insertion(self):
        # here we simulate the insertion of a simple expression, one symbol at a time...
        self.calculator.digit(1)
        self.calculator.plus()
        self.calculator.digit(2)
        # ... and we ensure the expression is as expected
        self.assertEqual("1+2", self.calculator.expression)

    def test_compute_result(self):
        # here we simulate the insertion of an expression "as a whole", 
        # by setting the expression attribute of a virgin Calculator
        self.calculator.expression = "1+2"
        # ... and we ansure the compute_result method evaluates the expression as expected
        self.assertEqual(3, self.calculator.compute_result())

    def test_compute_result_with_invalid_expression(self):
        # here we simulate the insertion of an invalid expression "as a whole"...
        self.calculator.expression = "1+"
        with self.assertRaises(ValueError) as context:
            # ... and we ensure the compute_result method raises a ValueError in such situation
            self.calculator.compute_result()
        # ... and we also ensure that the exception message carries useful information
        self.assertEqual("Invalid expression: 1+", str(context.exception))

Hands-on (pt. 4)

Playing a bit with `unittest`

Failing tests

Try to run tests via the terminal and via VS Code
- notice that in VS Code you can run tests selectively
Let’s now simulate the scenario where tests are failing (e.g. due to buggy code)
- edit the Calculator in file calculator/__init__.py to introduce a bug
  - e.g. change the __init__ function as follows:
```
def __init__(self):
    self.expression = "0" # bug: the expression is not initially empty
```
Run the tests again: many tests should now fail
- notice how the tests failure is reported in the terminal and in VS Code
- try to spot the source of the problem, from the error reports

Hands-on (pt. 5)

Playing a bit with `unittest`

Testing the GUI (+ integration with model)

Have a look to the tests/test_gui.py file and listen to the teacher explanation:

it contains a test suite for the CalculatorApp class

Notice that tests are based on custom _base class _(namely CalculatorGUITestCase), which adds
- custom action (e.g. press_button(button_name))
- custom assertions (e.g. assert_display(expected_text))
- custom setup and teardown activities (showing / closing the GUI)

⬇️

import unittest
from calculator.ui.gui import CalculatorApp

# this is not a test case! 
# it is a way to add custom actions, assertions, initialisation/clean-up activities to other test cases
class CalculatorGUITestCase(unittest.TestCase):

    # default initialization activity (create & start the GUI, i.e. our SUT)
    def setUp(self):
        self.app = CalculatorApp()  # create the GUI
        self.app._run_prepare()     # start the GUI

    # re-usable action: presses a button on the GUI, given the button's text
    def press_button(self, button_text):
        self.app.find_button_by(button_text).trigger_action()

    # re-usable assertion: checks the text displayed on the GUI is equal to the provided one
    def assert_display(self, expected_text):
        self.assertEqual(self.app.display.text, expected_text)   

    # default cleaning-up activity (stop the GUI)
    def tearDown(self):
        self.app.stop()

Hands-on (pt. 5)

Playing a bit with `unittest`

Testing the GUI (+ integration with model)

Have a look to the tests/test_gui.py file and listen to the teacher explanation:

it contains a test suite for the CalculatorApp class

Notice that tests are based on custom _base class _(namely CalculatorGUITestCase), which adds
- custom action (e.g. press_button(button_name))
- custom assertions (e.g. assert_display(expected_text))
- custom setup and teardown activities (showing / closing the GUI)
In particular, have a look to the TestExpressions test case, and listen to the teacher explanation

⬇️

# this is a test case! (based upon the aforementioned base class)
class TestExpressions(CalculatorGUITestCase):

    # test procedure: inserting and evaluating a simple integer expression
    def test_integer_expression(self):
        # insert symbols "1", "+", "2"
        self.press_button("1")
        self.press_button("+")
        self.press_button("2")
        # check the display shows "1+2"
        self.assert_display("1+2")
        # press the "=" button
        self.press_button("=")
        # check the display shows "3"
        self.assert_display("3")

    # test procedure: inserting and evaluating a simple float expression
    def test_float_expression(self):
        self.press_button("1")
        self.press_button(".")
        self.press_button("2")
        self.press_button("+")
        self.press_button("2")
        self.assert_display("1.2+2")
        self.press_button("=")
        self.assert_display("3.2")

Interesting things to notice (pt. 1)

To enable testing the GUI, the CalculatorApp class’s public API has been extended with further functionalities:
- find_button_by(text): a function returning the button widget with the given text
- display: an attribute referencing the display widget (it’s now public)

Before:

class CalculatorApp { - _calc: Calculator - _display: Label + build(): BoxLayout + on_button_press(button: Button) }

After:

class CalculatorApp { - _calc: Calculator + display: Label + build(): BoxLayout + on_button_press(button: Button) + find_button_by(text: str): Button - _browse_children(container): Iterable[Widget] }

New entries

find_button_by(text: str): is necessary to make simulate buttons pressure in the tests
_browse_children(container): is a private functionality, necessary to implement find_button_by
display: is necessary to make assertions about the displayed text in the tests

Interesting things to notice (pt. 2)

How these novel functionalities are implemented in practice is not that relevant, but here it is:

class CalculatorApp(App):
    # returns a sort of list of all the widgets directly or indirectly contained in the given container
    def _browse_children(self, container):
        yield container
        if hasattr(container, 'children'):
            for child in container.children:
                yield from self._browse_children(child)
    
    # returns the first widget in the GUI which 1. is a button and 2. whose text is equal to the given one
    def find_button_by(self, text) -> Button:
        for widget in self._browse_children(self.root):
            if isinstance(widget, Button) and widget.text == text:
                return widget
            
    def build(self):
        # ... (unchanged)
        self.display = Label(text="0", font_size=24, size_hint=(3, 1))
        # ... (unchanged)

    # the rest of the class is unchanged

Interesting things to notice (pt. 3)

Take-away: when writing post-hoc tests (i.e., after the main code has been already written), it is often necessary to extend the public API of the SUT to make its internal state and functioning observable and controllable from the outside, and therefore testable

to avoid this situation, it’s paramount to write tests before the main code is written
- we’ll see how

Interesting things to notice (pt. 4)

If you read them at the adequate abstraction level, each test case is telling a story about the SUT

e.g. TestCalculatorMethods is telling the story of the Calculator class
- it’s telling the story of how a Calculator object looks like when it is freshly instantiated
- it’s telling the story of how a Calculator object behaves when it is used to build an expression
- it’s telling the story of how a Calculator object behaves when it is used to evaluate an expression
e.g. TestExpressions is telling the story of the CalculatorApp class (i.e. the GUI)
- it’s telling the story of how the GUI looks like when it is freshly instantiated
- it’s telling the story of how the GUI behaves when it is used to build an expression
- it’s telling the story of how the GUI behaves when it is used to evaluate an expression

Take-away: the story you can picture in your mind when reading a test is a way to describe the test plan, the designer of the test suite was envisioning when writing the tests

Exercise

Write your own test suite

Focus on the test_gui.py file
Add one more test case for the GUI, say TestLayout, which ensures that:
- the GUI has a display, and its initial text is 0
- the GUI contains all the buttons for the digits, the operations, and the special commands
  - namely: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *, /, =, C, .
Hints:
- you may want to reuse the CalculatorGUITestCase class and its functionalities
- you may want to add one more custom assertion, say assert_button_exists, to CalculatorGUITestCase
  - in order to check that a button with a given text is present in the GUI

One possible solution is in the next slide

⬇️ (please resist the temptation, and try to solve the exercise before looking at the solution) ⬇️

One possible solution

(also available on branch exercises/01-test-layout of the testable-calculator repository)

class CalculatorGUITestCase(unittest.TestCase):
    # rest of the class is unchanged 

    def assert_button_exists(self, button_text):
        self.assertIsNotNone(self.app.find_button_by(button_text))


class TestLayout(CalculatorGUITestCase):
    buttons_to_test = {
        'C',
        '7', '8', '9', '/',
        '4', '5', '6', '*',
        '1', '2', '3', '-',
        '.', '0', '=', '+',
    }

    def test_initial_display(self):
        self.assert_display("0")

    def test_buttons(self):
        for button_text in self.buttons_to_test:
            with self.subTest(button=button_text):
                self.assert_button_exists(button_text)

what’s the purpose of subTest?
- it easies the debugging of the test suite, in case of multiple failures
  - try launching tests from VS Code’s UI

Test plan

Testing should be planned for in advance
A good test plan can guide the development, and should be ready early in the project
To plan a test, one might try to convert the requirements’s acceptance criteria into test cases
- this is mostly true for system and, to some extent, integration tests
To plan unit tests, one might try to create test cases covering each aspect of the public API of the SUT
- e.g. for each public class, a test case may cover all public functions and attributes of that class

When designing cars, the crash testing procedure, the engine test bench, and so on are prepared well before the car prototype is ready!

Test-Driven Development (TDD)

The practice of:

converting requirements to (executable) test cases
preparing tests before development
define the expected behavior via test cases
track all development by always testing all cases

Key-point: in TDD, tests are not only a form of validation, but also a form of specification

Development cycle in TDD

Capture a requirement into an executable test
Run the test suite, the new test should fail
Fix the code so that the new test passes
Re-run the whole test suite, all tests should pass
Improve the quality as needed (refactor, style, duplication…)

Exercise

Adding features to the Calculator via TDD

Customers ask for new features in the calculator:

possibility to write expressions with parentheses (e.g. (1+2)*3)

possibility to write expressions with the square root function (e.g. sqrt(4))

possibility to write expressions with the power function (e.g. 2**3)

Extend the model’s test suite (i.e. file test_model.py which aims at testing the Calculator class)
- this implies extensions for the Calculator class’s public API should be envisioned (not realised)
- use your imagination to invent reasonable extensions
Extend the GUI’s test suite (i.e. file test_gui.py which aims at testing the CalculatorApp class, i.e. the GUI)
- this implies novel buttons and actions should be envisioned (not realised) for the GUI
- use your imagination to invent reasonable buttons and their effects
Launch your tests: it’s OK if novel tests fail at this stage
- let’s just ensure they fail for the correct reasons (missing methods, or missing buttons)

One possible solution

(also available on branch exercises/02-tdd-before-impl of the testable-calculatorrepository)

Test suite for the model (i.e. test_model.py)

# other test cases are unchanged

class TestComplexExpressions(unittest.TestCase):
    def setUp(self):
        self.calculator = Calculator()

    def test_expression_with_parentheses(self):
        self.calculator.open_parenthesis()
        self.calculator.digit(1)
        self.calculator.plus()
        self.calculator.digit(2)
        self.calculator.close_parenthesis()
        self.calculator.multiply()
        self.calculator.digit(3)
        self.assertEqual("(1+2)*3", self.calculator.expression)
        self.assertEqual(9, self.calculator.compute_result())

    def test_expression_with_sqrt(self):
        self.calculator.digit(1)
        self.calculator.plus()
        self.calculator.square_root()
        self.calculator.open_parenthesis()
        self.calculator.digit(1)
        self.calculator.digit(1)
        self.calculator.minus()
        self.calculator.digit(2)
        self.calculator.close_parenthesis()
        self.assertEqual("1+sqrt(11-2)", self.calculator.expression)
        self.assertEqual(4, self.calculator.compute_result())

    def test_expression_with_pow(self):
        self.calculator.open_parenthesis()
        self.calculator.digit(1)
        self.calculator.plus()
        self.calculator.digit(1)
        self.calculator.close_parenthesis()
        self.calculator.power()
        self.calculator.digit(3)
        self.assertEqual("(1+1)**3", self.calculator.expression)
        self.assertEqual(8, self.calculator.compute_result())

One possible solution

(also available on branch exercises/02-tdd-before-impl of the testable-calculatorrepository)

Test suite for the GUI (i.e. test_gui.py)

class TestExpressions(CalculatorGUITestCase):
    # other test methods are unchanged

    def test_expression_with_parentheses(self):
        self.press_button("(")
        self.press_button("1")
        self.press_button("+")
        self.press_button("2")
        self.press_button(")")
        self.press_button("*")
        self.press_button("3")
        self.assert_display("(1+2)*3")
        self.press_button("=")
        self.assert_display("9")

    def test_expression_wit_sqrt(self):
        self.press_button("sqrt")
        self.press_button("4")
        self.press_button(")")
        self.assert_display("sqrt(4)")
        self.press_button("=")
        self.assert_display("2.0")

    def test_expression_with_pow(self):
        self.press_button("2")
        self.press_button("**")
        self.press_button("3")
        self.assert_display("2**3")
        self.press_button("=")
        self.assert_display("8")

Exercise (continued)

Now it’s time to implement the new features
- Goal: make the all tests pass
In any case, once you are done, commit & push

One possible solution is on the exercises/02-tdd-after-impl branch of the testable-calculator repository

feel free to inspect it, after you attempted to produce your own solution

On the cost of testing

Developing without testing is unsustainable

Yet many software projects have no or minimal tests, as:

Common misconception: We do not have time (or money) for testing

Beware: testing saves times in the long run, not testing is a cost!

Untested software components are likely sources of technical debt

Important notion

Technical debt is a concept in software development that reflects the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer.

Not writing tests on ASAP, greatly increases the technical debt
- editing the code becomes harder and harder, as the codebase grows
  - because any new change may break something else, and without tests that break would go unnoticed
Development is never really finished in real software projects
- (short-sighted) management will unlikely allocate time for improving the code, or writing tests
  - because that has no immediate visible return for the customer
TDD practices help in keeping the technical debt under control

What happens when there’s too much technical debt?

we never have the money to do it right but somehow we always have the fucking money to do it twice

$—$ UserInputSucks (@UserInputSucks) May 27, 2019

What if a project is not using TDD since the very beginning?

Decreasing preference order:

Ideal situation: always writing tests during design, before implementation
Common situation: design and implement, then write tests
Barely tolerable situation: design and implement, only add tests upon bugs
- (see next slide)
Very bad situation: never write tests

Tackling bugs and regressions

When a new bug (or a regression, namely, a feature that was working and it is now compromised) is discovered, resist the temptation to “fix” the issue right away

A fix without a test could be insufficient
The “fix” could break another feature (create a regression)

A more robust approach:

Reproduce the issue in a minimal context
Create a new test case that correctly fails
Fix the issue, and make sure that the test now passes
Ensure that all other tests still pass

Motivations:

the new test case prevents the issue from being mistakenly re-introduced again

develop the test case before the fix will help the debugging process

Testing software before it is ready: boundaries

Problem: how is it possible to test code that does not exist?

More in general: how to design a testbed for an engineering product that is not prototypied yet?

Clean boundaries: the component must have a well-defined interface with the rest of the world.
- in software, it means that the component has a well-defined Application Programming Interface (API).
- our artifact must be modularized correctly
  - (this also helps with development, simplicity and maintenance)
Clear scope: well engineered (software) components usually do one thing well.
- test plans are conceived to test that the one thing is performed correctly.

Testing software before it is ready: missing components

We can now design our tests,
- but how to run them if the components surrounding the tested one are not ready?

How to test a new suspension system if the “surrounding” car is not ready (not even fully designed) yet?
How to test that our new rocket engine works as expected with no rocket?
How to test that our multi-engine rocket works as expected without payload?

Testing software before it is ready: test doubles

The trick: simulate components that are not ready yet!

When writing software, components required for the execution that are not ready yet can be simulated if their API has been clearly defined

The simulated component are called test doubles

dummy: a (usually unimplemented) placeholder (e.g., unused mandatory argument)
- a weight put on the suspension
stub: partly implemented dummy
- a system applying variable weight to the suspension
spy: a stub that tracks information of the way it is being used
- a dynamometer recording the suspension behavior under different conditions
mock: a spy that expects to be used in a certain way, and fails if the expectation is unmet
- a smart dynamometer that interrupts testing if the suspension behavior is not nominal
fake: a fully implemented version of the component unsuitable for production
- a car prototype

Test doubles and development cost

Why should the team “waste” time creating doubles instead of just writing the thing?

doubles are cheaper: dedicated libraries make doubles implementation extremely quick
- In Python, unittest.mock is included in the distribution, and Doubles is a valid alternative.
doubles are simpler: only encode the behavior required to check some part of the behaviour.
- The probability of them being bugged is lower.
- Debugging is easier.

Takeaways

Test-driven development is a practice that can help in keeping the technical debt under control
- the idea is “design the testing procedure before development”
Tests act as a form of validation (ex-post), specification (ex-ante), and as sentinels (along the way)
Designing and implementing tests is a project-in-the-project
- it requires time and effort, it is a cost, it adds value on the long run
  - but making / keeping development sustainable
Patterns and strategies exists to design / implement tests, e.g. test doubles
- QA engineers / specialists are often dedicated to this task
Time for testing should be allocated in the project plan
- designing the test plans, and implementing the tests cases may take 30-50% of the software development time
  - cf. Time estimation for software testing

Checking (un)tested components: coverage (pt. 1)

Code coverage is a set of metrics that measure how much of the source code of a program has been executed when testing.

Common metrics:

function coverage: did the flow control get through this function?
branch coverage: did the flow control tried both branches of this condition?
line-of-code coverage: did the flow control get through this line during tests?
- most common, usually combined with branch coverage

Exercise (pt. 1)

Compute test coverage in Python

Notice the coverage (dev) dependency in the requirements-dev.txt file
- it is a tool that measures line-of-code coverage in Python, documentation here
- you should have restored dependencies already, if not pip install -r requirements-dev.txt
Runs the tests while measuring coverage: coverage run -m unittest discover -v -s tests/
- same command as for running tests, but with coverage run instead of python
- if not working, try python -m coverage run -m unittest discover -v -s tests/

Check the coverage report in the terminal coverage report -m

the output should be similar to the one below:

Name                     Stmts   Miss  Cover   Missing
------------------------------------------------------
calculator\__init__.py      38      3    92%   13, 39, 48
calculator\ui\cli.py        21      3    86%   16-17, 27
calculator\ui\gui.py        57      8    86%   55-57, 61, 63, 65, 69, 76     
tests\test_cli.py           15      0   100%
tests\test_gui.py           30      0   100%
tests\test_model.py         38      0   100%
------------------------------------------------------
TOTAL                      199     14    93%

pretty obscure, isn’t it?

Exercise (pt. 2)

Compute test coverage in Python

Let’s try to create a more pleasant report, in HTML format: coverage html
- this command will create a htmlcov folder in the current directory
- the htmlcov/index.html file is a static Web page, reporting the coverage of your project
Open the htmlcov/index.html file in your browser (any of the following may work)
- start .\htmlcov\index.html
- open htmlcov/index.html
- xdg-open htmlcov/index.html
- double click on the htmlcov/index.html file in your Explorer / Finder / Dolphin or Nautilus etc

Exercise (pt. 4)

Compute test coverage in Python

You should see an overview similar to the terminal one

Exercise (pt. 5)

Compute test coverage in Python

If you click on a file, you may get a line-by-line report of test coverage

BEWARE

About test coverage

the actual information coverage provides is which code is partly tested or untested!
we know nothing of the testing quality on the covered part, but that control flow goes through
Useful metric, but it cannot be the only metric to evaluate testing
- you may have 100% coverage and still have a very bad test suite (few requirements covered)
- you may have 80% coverage and still have a good test suite (all requirements covered)
- for sure, the lower the coverage the worst the test suite
Use coverage as a hint for reasoning about what to test next

Exercise

Use coverage to spot the untested parts of the testable-calculator project
Add tests which cover the untested parts
After your reach 100% coverage (or close) as your-self:
- did I cover all the requirements?
- are there any edge cases I did not cover?

Quality Assurance

“It works” is not good enough

(besides, the very notion of “it works” is debatable)

Software quality should be continuously assessed
The assessment should automatic whenever possible
QA should be integrated in the workflow!
- Quality violations should be considered as errors

Quality Assurance: levels

Syntactical correctness
Style and coherence
Flawed programming patterns
Violations of the DRY principle
Runtime Testing

Quality Assurance: syntactical correctness

Syntactical correctness is the first level of quality assurance:

“Is the code well-formed, i.e. readable for a computer?

In compiled languages, the compiler checks for syntactical correctness
- raising an error if the code is not syntactically correct
- most commonly linking errors are detected too by the compiler
  - e.g. calling a function which is not correctly imported
Python is an interpreted language, so there is no compiler
- but the CPython interpreter (the most frequently used) supports translation into bytecode
- similarly to compiled languages, translation into bytecode implies syntax and linking checks
Syntactical correctness can be checked in Python by means of:
- the compileall standard ($\approx$ included in Python by default) module
  - triggered by python -m compileall CODE_DIRECTORY_1 CODE_DIRECTORY_2 ...

Example

Checking for syntactical correctness in Python

In the testable-calculator project, run python -m compileall calculator tests
- you may notice an output similar to the following one:
```
Listing 'calculator'...
Listing 'calculator\\ui'...
Listing 'tests'...
```
  which means that all .py files in those directories were checked, and they are syntactically correct
  - if there were any errors, they would be listed here
Try to artificially add some syntax error in some Python file
- e.g. remove a closing parenthesis, or a colon
Run python -m compileall calculator tests again
- this time, the log should make you aware of where the error is

Quality Assurance: static analysis

Code analysis without execution is called static analysis.

Static analysis tools are often referred to as linters (especially those providing auto-formatting tools)

Idiomatic and standardized code:

reduces complexity
improves understandandability
prevents style-changing commits with unintelligible diffs
lowers the maintenance burden and related costs
simplifies code reviews
improves security

Static analysis: flawed programming patterns

Identification and reporting of patterns known to be problematic

Early-interception of potential bugs
Enforce good programming principles
Improves performance
Reduces complexity
Reduces maintenance cost

Quality Assurance: static analysis in Python

Mypy: static analysis for bug detection (requires annotations)
Pyflakes: effective programming, excluding style
Pylint: reverse engineering via Pyreverse, style (enforces PEP8), and effective programming
Bandit: security scanner
Prospector: tool collection. Includes Pylint and Pyflakes, adds PEP27-compliance checks for comments, complexity, packaging (via pyroma), secrets leaking (via dodgy), and unused code (via vulture) checks.

Example (pt. 1)

Static analysis in Python with `mypy`

Notice the mypy (dev) dependency in the requirements-dev.txt file
- it is a static analysis tool for Python, documentation here
- you should have restored dependencies already, if not pip install -r requirements-dev.txt

Run mypy on the testable-calculator project

mypy calculator tests

you may notice an output similar to the following one:

calculator\__init__.py:13: error: Unsupported operand types for + ("str" and "int")  [operator]
calculator\__init__.py:44: error: Parameterized generics cannot be used with class or instance checks  [misc]
calculator\__init__.py:44: error: Argument 2 to "isinstance" has incompatible type "<typing special form>"; expected "_ClassInfo"  [arg-type]
calculator\__init__.py:46: error: Returning Any from function declared to return "int | float"  [no-any-return]
Found 4 errors in 1 file (checked 6 source files)

Example (pt. 2)

Static analysis in Python with `mypy`

What are those errors?
- the calculator works, and it passes tests, so they are not errors in the usual sense
- they are flawed programming patterns, i.e. potential bugs
  - or at least potential sources of headaches
Listen to the teacher explanation about the meaning of those errors
- and how to fix them

Quality Assurance: violations of the DRY principle

DRY: Don’t Repeat Yourself

General advice: never copy-paste your code
- if you need to copy-paste something, you probably need to refactor something
Instead of copy-pasting code, write a parametric function/class/module which can be re-used
- improves understandandability
- reduces maintenance cost
- simplifies code reviews

Before (reliance on copy-paste):

def test_my_gui(self):
    self.sut.find_button_by("1").trigger_action()
    self.sut.find_button_by("2").trigger_action()
    self.sut.find_button_by("3").trigger_action()
    self.sut.find_button_by("4").trigger_action()
    self.sut.find_button_by("5").trigger_action()

After refactor (no more duplication):

def press_button(self, text):
    self.sut.find_button_by(text).trigger_action()

def test_my_gui(self):
    for i in range(1, 6):
        self.press_button(str(i))

Multi-language tool: Copy/Paste Detector (CPD) (part of PMD)

Additional checks and reportings

There exist a number of recommended services that provide additional QA and reports.

Non-exhaustive list:

Codecov.io
- Code coverage
- Supports Jacoco XML reports
- Nice data reporting system
Sonarcloud
- Multiple measures, covering reliability, security, maintainability, duplication, complexity…
Codacy
- Automated software QA for several languages
Code Factor
- Automated software QA

Check your understanding (pt. 1)

In the context of SE, what is quality assurance?
When is software considered “correctly working”?
When is software considered “of good quality”?
In the context of software engineering, what is testing?
In the context of software testing, what is the difference between an automated and manual test?
In the context of software testing, what are the most common testing scopes?
What is the difference among unit, integration, and end-to-end testing?

Check your understanding (pt. 2)

What is the problem in skipping unit testing and just focus on integration testing?
Why one may want to have automated tests in a software project?
What issues may arise in the long run when a software project is lacking automated testing?
Why is reproducibility important in testing? How to achieve it?
What is test code? How to separate it from the main code? Why?
What is test driven development (TDD)?
In what sense can software test act as a form of specification?
What is technical debt? How is it related to software testing?
How to deal with a project which was not following TDD since the very beginning?

Check your understanding (pt. 3)

In the context of software testing, what is a regression?
What are test doubles and what problem do they address?
In the context of software testing, what is test coverage?
What are the common metrics for test coverage?
How to measure the test coverage of a Python project?
If a project has 100% test coverage and 100% success rate for tests, can we declare it bug free?
- Can we safely say it satisfies all requirements?
Aside from testing, what is quality assurance about?
In the context of software engineering, what is static analysis?
What static analysis tool may you exploit when working on a Python project? What’s their purpose?

Lecture is Over

Compiled on: 2025-06-30 — printable version

back to ToC

QA, reproducibility, TDD

Software Engineering

(for Intelligent Distributed Systems)

A.Y. 2024/2025

Giovanni Ciatto (reusing material made by Danilo Pianini)

Quality assurance

Quality assurance: “works”

What does “works” mean for software?

Quality assurance: “good quality”

What does “good quality” mean for software?

Testing: criteria

Automated vs. manual

Testing scope (pt. 1)

Testing scope (pt. 2)

Automated tests as sentinels

Reproducibility

Technicalities of writing tests

Updated project structure

Nomenclature about testing

Writing tests in Python

Anatomy of a test suite in unittest

Technicalities of unittest tests suites

Hands-on (pt. 1)

Playing a bit with unittest

Restoring dev dependencies

Hands-on (pt. 2)

Playing a bit with unittest

Running tests via the terminal

Hands-on (pt. 3)

Playing a bit with unittest

Running tests via VS Code

What if you cannot find the Test section? (pt. 1)

What if you cannot find the Test section? (pt. 2)

What if you cannot find the Test section? (pt. 3)

What if you cannot find the Test section? (pt. 4)

Hands-on (pt. 4)

Playing a bit with unittest

Inspecting a real unit test

Hands-on (pt. 4)

Playing a bit with unittest

Failing tests

Hands-on (pt. 5)

Playing a bit with unittest

Testing the GUI (+ integration with model)

Hands-on (pt. 5)

Playing a bit with unittest

Testing the GUI (+ integration with model)

Interesting things to notice (pt. 1)

New entries

Interesting things to notice (pt. 2)

Interesting things to notice (pt. 3)

Interesting things to notice (pt. 4)

Exercise

Write your own test suite

One possible solution

Test plan

Test-Driven Development (TDD)

Development cycle in TDD

Exercise

Adding features to the Calculator via TDD

One possible solution

One possible solution

Exercise (continued)

On the cost of testing

Important notion

What happens when there’s too much technical debt?

What if a project is not using TDD since the very beginning?

Tackling bugs and regressions

Testing software before it is ready: boundaries

Testing software before it is ready: missing components

Testing software before it is ready: test doubles

Test doubles and development cost

Takeaways

Checking (un)tested components: coverage (pt. 1)

Exercise (pt. 1)

Compute test coverage in Python

Exercise (pt. 2)

Compute test coverage in Python

Exercise (pt. 4)

Compute test coverage in Python

Anatomy of a test suite in `unittest`

Technicalities of `unittest` tests suites

Playing a bit with `unittest`

Playing a bit with `unittest`

Playing a bit with `unittest`

Playing a bit with `unittest`

Playing a bit with `unittest`

Playing a bit with `unittest`

Playing a bit with `unittest`

Static analysis in Python with `mypy`

Static analysis in Python with `mypy`