Quantitative Legionella risk: scoring you can defend

Put a number on a hazard and people stop arguing with it. That is the quiet appeal of going quantitative with Legionella risk: a score feels more defensible than a paragraph of judgement. It is also where assessments come unstuck, because a number inherits every weakness of the evidence and assumptions feeding it while looking far more certain than either.

If you are turning a building survey into a list of risk-ranked actions, the question worth asking is not “how do I score this” but “could someone who has never seen this site reconstruct why each score is what it is”. An auditor, a new responsible person inheriting the file at handover, and an investigator after an incident will all ask exactly that. Build the assessment so the answer is yes, and the scoring largely takes care of itself.

Two different things both called “quantitative”

The word covers two methods that sit a long way apart, and conflating them is the first mistake.

The first is full quantitative microbial risk assessment, or QMRA. It models the chain from the concentration of organisms in the water, through aerosol generation and inhaled dose, to a modelled probability of infection. It is real science with a real place: cooling-tower clusters, large district systems, research into a specific exposure route. It is also data-hungry and assumption-heavy — far more machinery than most ordinary UK buildings’ decisions need.

The second is what most duty holders mean when they say “quantitative”: semi-quantitative risk scoring inside an otherwise conventional assessment. Each finding from the survey is rated for how likely it is to let Legionella proliferate or reach people, and for how serious the outcome would be, and those two ratings combine into a priority. BS 8580-1 is the code of practice that sits behind this kind of structured Legionella risk assessment in the UK [3], and L8 makes the assessment itself an Approved Code of Practice duty rather than an optional survey [1].

The honest position is that semi-quantitative scoring borrows QMRA’s confident vocabulary without QMRA’s rigour. That is fine, as long as nobody pretends otherwise.

The risk matrix, described closely enough to draw it

The workhorse of semi-quantitative scoring is a grid you could sketch on the back of a survey sheet. It has two axes.

Down the left edge runs likelihood: how favourable the system and its use make growth and exposure. Low at the bottom — hot water genuinely hot, cold genuinely cold, outlets in daily use, no obvious stored or stagnant water. Rising to high at the top — tepid stored water, long dead legs, an intermittently used shower that sprays straight to head height.

Along the bottom runs consequence: how severe the outcome would be if exposure occurred. Low on the left — a generally fit population, a tap that fills a bucket rather than misting. Rising to high on the right — a care setting, immunosuppressed occupants, an aerosolising outlet beside vulnerable people.

Every survey finding becomes a point on that grid. Where it lands sets a band — green, amber or red, or one-to-five on each axis if you prefer figures — and each band maps to an action and a timescale: red means act now, amber means schedule, green means monitor and record. Two further columns finish the picture: the named owner of the action, and the trigger that would force a re-score. Someone who has never seen your building should be able to read a single row left to right — finding, likelihood, consequence, band, owner, review trigger — and understand exactly what you decided and why. A grid that only the assessor can interpret has failed before the ink dries.

Where quantitative assessments quietly fail

The numbers introduce their own failure modes, and the impressive-looking ones are the most dangerous.

False precision. Multiply a likelihood of 3 by a consequence of 4 and you get a tidy 12 — but both inputs were judgement calls dressed as measurements. Decimal places make it worse. The scale exists to sort findings into act-now, schedule and monitor, not to claim two-significant-figure accuracy about a building.

Aggregation that hides the outlier. Average a site to “amber overall” and a single red outlet — the unused shower in the void wing — vanishes into the mean. Score outlets and assets, not whole buildings, and let the worst case drive the action rather than the average.

Stale inputs. A score is a snapshot of a use pattern. The moment a wing closes, a tenant changes, or pipework is altered, half your likelihood ratings are out of date. A matrix calculated once and filed is decoration, not control.

Scores with nothing behind them. Every cell should trace to something concrete: a temperature reading, a photograph of a dead leg, a sampling result, a maintenance record. A rating with no evidence beneath it is an opinion with a number stapled on, and it is the first thing an auditor will pull.

Anchoring. Assessors drift toward last year’s figure, or toward the result that demands least remedial work. Calibrating the bands in advance is the defence.

Making the score auditable

This is where the method earns its keep, and it is the part generic guidance skips. The output that matters is not the matrix on the cover page; it is the trail underneath it.

Tie each score to its evidence. Put the reading, the photo reference or the report number next to the rating so it can be checked rather than merely believed. Write the reasoning, not only the number: “likelihood high because this shower serves a room void for weeks and the dead leg downstream measured warm” tells a successor far more than “L=4”.

Version the assessment, too. When a score changes at review, record what changed and why, not just the new value. An investigator after a suspected case, or a panel at a contract handover, will want the history, not a single current page. If you run several buildings, that discipline is also what keeps a portfolio genuinely comparable rather than a stack of incompatible spreadsheets; managing risk across multiple sites lives or dies on it.

Finally, keep the records L8 expects — the assessment, the written control scheme and the monitoring — joined up, so a finding leads to a control, which leads to a verifiable, dated task [1] [2]. A score that connects to nothing anyone performed is just commentary.

Where the numbers stop

A scoring matrix is a prioritisation aid, not a verdict, and it pays to be clear about its limits.

It does not lower the legal bar. A finding scored amber is still a finding; the scheme owner still has to act, and a comfortable number is no defence if the underlying control is missing.

It does not replace competence. The figures, bands and thresholds should be set and reviewed by a competent person who understands the specific system, not lifted from a template. QMRA in particular is specialist work, and a spreadsheet that borrows its language is not the same thing.

And it does not set your monitoring calendar. HSE guidance is clear that how often you test and monitor follows the system and the risk assessment, not a fixed interval chosen for tidiness [4]. The score should inform that frequency; it must never quietly override the judgement behind it.

Treat the matrix as a way to make decisions visible and arguable. The day it stops being questioned is the day it has stopped being useful — and that is the next thing to check on your own current assessment: pick three red and amber findings, and see whether the file alone tells you why they scored as they did.

FAQ

Is a numerical risk score actually required, or will a written assessment do?

The duty is to assess the risk competently and act on it; nothing in HSE guidance forces a numerical score [1]. Scoring is a tool for sorting and defending priorities, not a legal requirement in itself, and a well-reasoned narrative assessment can satisfy the duty perfectly well. Add a number only where it makes your decisions clearer — never to dress up a thin judgement.

Do I need full QMRA modelling for an ordinary building?

Almost never. Dose-response modelling suits specific, high-consequence exposure questions, such as investigating a cooling tower. For a standard hot and cold water system, a structured semi-quantitative assessment following BS 8580-1 [3] gives defensible prioritisation without the data and specialist input that QMRA demands.

How do two assessors reach the same score on the same finding?

Define the bands before anyone scores. Write down, in plain measurable language, what “high likelihood” and “high consequence” mean for your estate — a temperature range, a void period, an occupant group — so the rating follows the evidence rather than the assessor’s mood. That calibration is what stops the matrix sliding into personal opinion with a number attached.

Sources

[1] HSE, “Legionnaires’ disease. The control of legionella bacteria in water systems - Approved Code of Practice and guidance (L8)”. https://www.hse.gov.uk/pubns/books/l8.htm [2] HSE, “Legionnaires’ disease: Technical guidance (HSG274)”. https://www.hse.gov.uk/pubns/books/hsg274.htm [3] BSI, “BS 8580-1:2019 - Risk assessments for Legionella control. Code of practice”. https://knowledge.bsigroup.com/products/water-quality-risk-assessments-for-legionella-control-code-of-practice-1 [4] HSE, “Testing and monitoring your water system for legionella”. https://www.hse.gov.uk/legionnaires/testing-monitoring-water-system.htm

Advanced methodologies: quantitative Legionella risk assessment