Automating Software Quality
Why discuss it?
“There is a big difference between saying, ‘eat an apple a day’ and actually eating an apple every day”
No assertions of tooling are made but some recommendations or examples are used where appropriate.
What is continuous integration?
Simple build software every time code is changed, the reality of this statement is somewhat more complex. Like many other concepts CI is the bringing together of a series of common sense practices to produce something that works well. What about the integration bit Continuous is obvious, if builds are occurring every time code is checked in but what about the integration bit?
So lets start off with the contentious bit!:
An integration is a merge, this occurs every time two developers work on the same source tree but don’t accept/see the changes of the other.
Common practice is to work in isolation on ‘task’ branches, this gives a FALSE sense of security. During development (and testing) there is a constant baseline for a developer to work against so work continues at a pace. Divergent changes will be shown up during integration, errors are DELAYED not removed, the later in the software process errors are found the more costly they are to resolve. Martin Fowler asserts that the time to merge changes (and so the cost of the merge) raises exponentially with the time the code branches apart, this sounds reasonable. Thus DONT branch, continuous integration is a practice that demands all development exists within a single branch and is a set of practices that make sure this can occur in a controlled fashion.
This is clearly unrealistic, but its a good aim. The following guidance should be considered when branching.
1.Commit code every day, uncommitted code does not exist.
2.Each commit should contain a complete unit of work.
3.No commit should EVER break the build.
4.Each commit should be of a size to be peer review-able.
5.Each commit should contribute to a current or future production release
Where the above rules cannot be followed branches should be used consider the following examples:
I.A speculative change that may or not make it into production, put it on a branch until such time as its fate is established, this avoids polluting trunk.
II.A long running disruptive change (e.g. a compiler upgrade), this cannot be committed in small chunks without breaking the build.
Realise that EVERY branch costs developer time so branch little and merge often.
Release Candidate Branch
Just prior to a release a branch should be established to stabilise code prior to a release. This branch is known as the release candidate branch. The release candidate branch should be constructed as late as possible prior to a release. The only code changes permitted on the release candidate branch are bug fixes required for the release, no new features should be added. If new features are required the branch should be abandoned and re-established. Changes to the release candidate branch should only be made by merges from the trunk, thus the fix should be made to trunk and merged UP to the branch.
Pollution of the release candidate branch
A common critique of the single trunk approach is that code destined for later releases will be released ahead of time, release candidate branches are often made earlier to ensure that such code ‘pollution’ is avoided. Indeed often branches are created specifically to avoid this ‘pollution’. I would assert that this pollution is good, it reduces the testing burden and delivers higher quality software faster though the development group should be aware of the constraints they are working in an must adjust to fit.
Consider an example.
A development team is working on a web browser called aluminium 🙂
DeveloperA is adding flash support.
DeveloperB is adding javaScript support.
Clearly both these features hit the renderer, naively a task branch would be created for each and as the code is completed it be merged to trunk/main/head ready for release.
Consider the testing efforts, DeveloperA has to test on his branch, when tests pass he needs to retest on trunk/main/head as he needs to validate the merge did not break his code. A release can now occur. When we look to release the JavaScript work again this must be tested on the branch and then again on main, we must of course also regression test the flash work to ensure that the JavaScript changes have not affected the code base. All told the application must be UAT tested 5 times to release these 2 features.
If you flatten the work above onto a single branch then the merges occur each day. In this situation the flash testing is done in the same code base as the JavaScript code base, aspects of the JavaScript code base exists on the Release Candidate but the acceptance tests pass so the Java Script code is not affecting the correct running of the application. Problems in the first example will certainly also exist in the second but they will be found early when they are cheap to fix. Testing will NOT be duplicated. It is clear that DeveloperB may have to ‘hide’ aspects of his code so the functionality is not released half completed, this can be simply achieved by suppressing menus or using compiler pragma to remove aspects of the code unless certain properties (DEV=true) are present.
Build
What constitutes a build?
Clean
Get Source
Compile application
Run tests
Inspect software
Build release package
Deploy release package as if to production
Build in CI is more than might be considered within ‘traditional development’. The process above is followed for every software change, not just for ‘special’ release builds. If you do a build ever hour you can be pretty certain it will work when you need to-do your release. If you only ever execute the release scripts once a month, GOOD LUCK!
Why?
1.Gives confidence in release process, reduces fear enables smaller development cycles and quicker time to market.
2.Improves confidence of developers to be able to make changes (especially hard changes that may break things).
3.BREAKS EARLY!
If the benefit (to the developer) of CI could be mostly easily spelt out it would be in taking back control of the source tree, the ability to make changes with confidence that the effects of the changes can be managed. The remainder of the document will pick out the stages of the build and demonstrate how by automating this and by adopting appropriate development practices we can gain confidence in change.
Clean, Get Source
Automated builds must occur on a clean machine, access to this machine should be tightly controlled and changes to it should be subject to version/audit. Each build should be from clean, to ensure that the builds can be repeated and that no side effects from previous builds are carried forward.
Compile Application. The build server builds should be identical to the builds that occur on developers machines. The application should compile with no errors or warnings. Any warnings that exist on the build machine should fail the build, warnings that are acceptable should be acknowledged by compiler pragma to suppress them for the relevant line of code.
Run Tests
Why test? The answer seems obvious but the reality is more interesting, in simple terms testing is performed to ensure that an application does what we think it does but the benefits for a programmer are more profound.
Make sure software works
Make sure software keeps working (which of course enables change)
Show other developers how to exercise your code (and how not too), tests are often a great source of documentation.
Test harness to enable debugging of subsections of code.
Validate bugs have been resolved (and stay resolved)
Types of test
Unit tests
Smallest testable part of an application, the best unit test holds all functions/features/resoures not being tested constant. Where dependencies exist (high coupling) then either refactor to remove them or use fakes/mocks to eliminate. Unit tests should be very fast, thousands running in a few seconds, this is important to ensure that all tests are run at each build. Isolate expensive modules with fakes/mocks.
Mock / Fakes, these objects implement the same interface as a ‘real’ object and are used to isolate unit tests from other aspects of the system. Fakes are concrete objects that deliver a canned response. Mock objects are active objects that imply their own assertions (methodX must be evaluated prior to methodY) to more realistically portray the ‘mocked’ object. Mocks/Fakes can be hand crafted or use one of the many reflection based API’s (e.g. Easy mock).
Integration tests, End to end testing of modules to deliver business value, often referred to as black box testing as little or no code is adjusted (or assumed) within the test. Will not be evaluated on every build but will be evaluated prior to each release.
Regression tests
A special set of integration tests designed to ensure that the business value delivered by software does not change over time. Will not be evaluated on every build but will be evaluated prior to each release.
Test Driven development
TDD cycle
Add test
Run all tests and see failure
Write code to make test pass
Run tests to see pass
refactor
Why do this?
KISS, YAGNI – develop ONLY what is needed, focus on the prize!
Three rules of test driven development
1.You are not allowed to write any production code unless it is to fix a failing test.
2.You ar enot allowed to write any more of a unit test than is sufficient to fail.
3.You are not allowed to write any more production code than is sufficient to cause a failing test to pass.
Coverage
Coverage is usually expressed as series of measures:
Function coverage – % of functions executed.
Statement coverage – % of statements executed.
Condition coverage – % of branch choices evaluated.
Path coverage – % paths executed
Entry/Exit coverage – % of call/return evaluated.
Coverage is a measure of the quality of testing, developers should strive to 100% coverage but in reality its impossible to achieve (a module with n decisions has 2n paths,
loops can result in an infinite measure). An approximation of sufficient path coverage can be found by considering cyclometric complexity.
Inspect Code
Apply rules to the code to ensure it complies with standards established by the development team leads. A variety of tools exist to perform static and dynamic code inspection, some of the measures are listed here:
Source Lines of Code (SLOC)
The number of lines of code, this is a good measure of effort but a terrible measure of functionality as a good programmer will often implement more functionality with less lines of code.
“Measuring programming progress by lines of code is like measuring aircraft build progress by weight” – Bill Gates.
There are logical and physical SLOC figures based on programming style:
for (int I = 0; I < 10; i++) System.out.println(“Count: “ + i);
vs
for (int I = 0; I 10){.. } else{.. }
would have a complexity of 2. Cyclometric complexity is a measure that directly affects the quality of code, lower complex code is easier to maintain and test. The complexity figure is valuable to QA as it gives an indication as to the number of tests that should be executed. There does exist a minimal complexity for a given language/algorithm but its rare that any code is expressed to that complexity therefore programmers can frequently improve quality just by looking to reduce this measure.
Cohesion & Coupling
Cohesion is a measure of how strongly-related and focused the responsibilities of a software module are.
Coupling relates to a relationship where one module interacts with another, there is low coupling if the interaction is via a well known interface without dependance on internal state.
Code that has high cohesion and low coupling is easy to maintain and understand:
Changes in one module should not cause ripples into other modules
Modules are easy to understand /develop in isolation.
Modules can be easily re-used.
Build Release package, Deploy release package
Donut Rule
NOONE ever breaks the build, the term used by many CI/XP developers is ‘in the green’ referring to the green bar that is shown when all tests pass. Every time the build is broken it should be fixed at once and the guilty developer should purchase sugar covered goodies for the rest of the team!
Acknowledgements
VERY little of the above is my own work. I read a lot and most of what’s above is the words of men/women smarter than me.
“Every time the build is broken it should be fixed at once and the guilty developer should purchase sugar covered goodies for the rest of the team!”
One of the risks with placing a potential cost on committing code is that it will incentivise developers to minimise the number of commits. Which is at odds with the “commit early, commit often” philosophy.
Making a broken build the team priority is productive. Particularly since the usual cause of a broken build is a miscommunication between two parts of the team. A process which leaves one half begging the other half to alter their code to comply with a change is a process which will lead to tears when pressure arrives.
However fining people isn’t likely to improve the mood over commits. It leads to arguments over who is actually to blame (it’s not always the person who commits the change), it leads to recriminations and aggravation and eventually intransigence — people will end up refusing (in creative ways) doing work which leads to them being fined.
Yes, it might encourage people to have proper integration meetings and so on, but those are things which get dropped when you put pressure on a dev team. Adding more pressure to them is probably not the answer.
It might encourage local test builds. However if you can do a full test build locally in a sensible time, you don’t NEED the master test build system. And it’s unlikely that every developer will be able to build every version of every component on every platform — so basically each checkin turns into a gamble that something they couldn’t test will be OK, with a cost associated otherwise.
Your developers may not relish the idea of subsidising their inability to do full test builds with fines which come of their pocket.
And then you have to think what you do with the people who, simply flat out refuse to pay the fine. A decent HR department is going to tell you that applying any sort of administrative sanction for “not being a team player by failing to buy snacks for the team” is not going to look good the minute anyone tries to say it at an industrial tribunal…
~Katie.