Algorithmic Bias in UK Policing: When Poor Data Builds Dangerous Systems

News and information from the Advent IM team.

For all the rhetoric surrounding “innovation in policing,” one uncomfortable truth keeps resurfacing: if the data is flawed, the system built on it will also be flawed. Recent revelations about racial bias in police‑deployed facial recognition technologies don’t just point to algorithmic shortcomings—they expose a foundational data problem that policing has not yet reckoned with.

The testing conducted by the National Physical Laboratory (NPL), combined with pointed criticism from the Information Commissioner’s Office (ICO), demonstrates that UK police forces have been deploying systems trained on skewed, insufficient, or outdated data. Worse, these issues were not proactively disclosed to regulators or oversight bodies, compounding the risks and eroding public trust.

The message is clear: better technology is impossible without better data hygiene.

Biased Inputs, Biased Outputs: The Data Problem at the Heart of Facial Recognition

The NPL’s independent testing revealed that the Cognitec FaceVACS‑DBScan ID v5.5 algorithm exhibited significantly higher false matches for Black and Asian people compared to white subjects, showing false positive rates of 5.5% and 4% respectively—versus just 0.04% for white individuals. These disparities aren’t random. They’re indicative of a training data ecosystem that has failed to capture the full diversity of the public it’s meant to serve.

In other words, if the model is trained predominantly on lighter‑skinned faces, its accuracy will skew accordingly.

This is Data Science 101—but it is now a live operational problem in UK policing.

The Home Office itself admitted that the algorithm was “more likely to incorrectly include some demographic groups,” following NPL’s analysis. The implications are profound: datasets within the Police National Database (PND) and linked biometric repositories appear to carry demographic imbalances that have now manifested as quantifiable harm.  These aren’t simply “algorithmic issues”—they reflect poor data stewardship.

Out‑of‑Date, Poorly Curated, and Opaque: A Recipe for Risk

Perhaps the most troubling revelation is not the bias itself, but the lack of transparency surrounding it.

The ICO stated unequivocally that the Home Office had not previously disclosed the historical bias present in these systems, despite regular engagement with the regulator. That means inaccuracies baked into the data, and consequently into the models, went unchallenged for far too long.

Inadequate data hygiene manifests in multiple ways:

  • Outdated training sets that don’t reflect real‑world demographic diversity.
  • Poor labelling quality, causing the algorithm to learn incorrect patterns.
  • Incomplete datasets that overrepresent some groups and underrepresent others.
  • Lack of version control or lifecycle management, making it harder to track and fix known issues.
  • No routine bias monitoring, meaning bias compounds quietly until exposed by external testing.

For any system that influences public-sector decision‑making—let alone ones that can trigger arrests—this is unacceptable.

You wouldn’t accept uncalibrated equipment on an aircraft or outdated schematics in a nuclear facility. So why are we tolerating data practices in policing technologies that fall below basic safety‑critical standards?

Scaling Deployment Without Scaling Data Quality Controls

Several policing bodies have advocated for rapid expansion of facial recognition technology into public spaces such as shopping centres, stadiums, and transport hubs. But as the APCC warned, the NPL’s findings “shed light on a concerning inbuilt bias,” stressing that technology has been deployed “without adequate safeguards in place”.

This raises the most important question of all:

If the data isn’t clean, governed, diverse, or high‑quality, what exactly are we scaling?

Algorithmic accuracy does not magically improve with deployment volume. In fact, scaling bad data only scales bad outcomes.

The risk is that policing begins to rely on models built on:

  • biased historical custody images,
  • datasets reflecting legacy policing disproportionality,
  • and incomplete demographic representation.

Worse, poorly governed data pipelines can cause bias to persist across algorithm generations, even if newer models claim better performance. Without robust data governance, you are effectively carrying yesterday’s errors into tomorrow’s systems.

Good Data Hygiene Isn’t an IT Exercise—It’s an Ethical Obligation

If UK policing wants to embrace emerging technologies, it must first embrace data hygiene as a non‑negotiable foundation.

That means:

1. Mandatory Data Quality Standards for All Training Sets

Every dataset used to train or fine‑tune policing tools should meet quality benchmarks comparable to those used in safety-critical sectors.

2. Regular Bias Audits and Dataset Refresh Cycles

Bias assessment cannot be a one‑off. Data must be continuously monitored, updated, and validated—especially as population demographics change.

3. Transparent Disclosure to Regulators and the Public

The ICO should not have to discover historical bias by accident. Missed disclosures undermine trust and hinder independent scrutiny.

4. Clear Data‑Provenance Tracking

Models should be accompanied by lineage information:
Where did the data come from? How old is it? How diverse is it? How was it labelled?

5. Purging and Retiring Unreliable Data

If data has been shown to contribute to skewed outcomes, it must be either rebalanced, revalidated, or retired—never quietly reused.

Better Data First, Better Technology Second

The temptation in policing is to see emerging technologies as shortcuts to efficiency. But algorithms reflect the data we feed them. When that data is flawed, the system becomes an engine for automated injustice. As we stand on the cusp of a national rollout of facial recognition, the priority cannot be “deploy fast, learn later.”

The priority must be this:

Fix the data.
Govern the data.
Understand the data.
Only then can we responsibly deploy the tools.

Anything less risks embedding inequity into the very infrastructure of modern policing.

Share this Post