Algorithmic Bias in UK Policing: When Poor Data Builds Dangerous Systems
News and information from the Advent IM team.
For all the rhetoric surrounding “innovation in policing,” one uncomfortable truth keeps resurfacing: if the data is flawed, the system built on it will also be flawed. Recent revelations about racial bias in police‑deployed facial recognition technologies don’t just point to algorithmic shortcomings—they expose a foundational data problem that policing has not yet reckoned with.
The testing conducted by the National Physical Laboratory (NPL), combined with pointed criticism from the Information Commissioner’s Office (ICO), demonstrates that UK police forces have been deploying systems trained on skewed, insufficient, or outdated data. Worse, these issues were not proactively disclosed to regulators or oversight bodies, compounding the risks and eroding public trust.
The message is clear: better technology is impossible without better data hygiene.
The NPL’s independent testing revealed that the Cognitec FaceVACS‑DBScan ID v5.5 algorithm exhibited significantly higher false matches for Black and Asian people compared to white subjects, showing false positive rates of 5.5% and 4% respectively—versus just 0.04% for white individuals. These disparities aren’t random. They’re indicative of a training data ecosystem that has failed to capture the full diversity of the public it’s meant to serve.
In other words, if the model is trained predominantly on lighter‑skinned faces, its accuracy will skew accordingly.
This is Data Science 101—but it is now a live operational problem in UK policing.
The Home Office itself admitted that the algorithm was “more likely to incorrectly include some demographic groups,” following NPL’s analysis. The implications are profound: datasets within the Police National Database (PND) and linked biometric repositories appear to carry demographic imbalances that have now manifested as quantifiable harm. These aren’t simply “algorithmic issues”—they reflect poor data stewardship.
Perhaps the most troubling revelation is not the bias itself, but the lack of transparency surrounding it.
The ICO stated unequivocally that the Home Office had not previously disclosed the historical bias present in these systems, despite regular engagement with the regulator. That means inaccuracies baked into the data, and consequently into the models, went unchallenged for far too long.
Inadequate data hygiene manifests in multiple ways:
For any system that influences public-sector decision‑making—let alone ones that can trigger arrests—this is unacceptable.
You wouldn’t accept uncalibrated equipment on an aircraft or outdated schematics in a nuclear facility. So why are we tolerating data practices in policing technologies that fall below basic safety‑critical standards?
Several policing bodies have advocated for rapid expansion of facial recognition technology into public spaces such as shopping centres, stadiums, and transport hubs. But as the APCC warned, the NPL’s findings “shed light on a concerning inbuilt bias,” stressing that technology has been deployed “without adequate safeguards in place”.
This raises the most important question of all:
Algorithmic accuracy does not magically improve with deployment volume. In fact, scaling bad data only scales bad outcomes.
The risk is that policing begins to rely on models built on:
Worse, poorly governed data pipelines can cause bias to persist across algorithm generations, even if newer models claim better performance. Without robust data governance, you are effectively carrying yesterday’s errors into tomorrow’s systems.
If UK policing wants to embrace emerging technologies, it must first embrace data hygiene as a non‑negotiable foundation.
That means:
1. Mandatory Data Quality Standards for All Training Sets
Every dataset used to train or fine‑tune policing tools should meet quality benchmarks comparable to those used in safety-critical sectors.
2. Regular Bias Audits and Dataset Refresh Cycles
Bias assessment cannot be a one‑off. Data must be continuously monitored, updated, and validated—especially as population demographics change.
3. Transparent Disclosure to Regulators and the Public
The ICO should not have to discover historical bias by accident. Missed disclosures undermine trust and hinder independent scrutiny.
4. Clear Data‑Provenance Tracking
Models should be accompanied by lineage information:
Where did the data come from? How old is it? How diverse is it? How was it labelled?
5. Purging and Retiring Unreliable Data
If data has been shown to contribute to skewed outcomes, it must be either rebalanced, revalidated, or retired—never quietly reused.
The temptation in policing is to see emerging technologies as shortcuts to efficiency. But algorithms reflect the data we feed them. When that data is flawed, the system becomes an engine for automated injustice. As we stand on the cusp of a national rollout of facial recognition, the priority cannot be “deploy fast, learn later.”
The priority must be this:
Fix the data.
Govern the data.
Understand the data.
Only then can we responsibly deploy the tools.
Anything less risks embedding inequity into the very infrastructure of modern policing.