This is probably not the most attention-grabbing title for an editorial, but for a while I worked at the UK’s National Physical Laboratory (NPL) in London, where part of the lab’s job is supporting standards work through the British Standards Institution (BSI). That experience left me with two impressions about metrology.
First, that repeatable measurement is expensive, because it requires controlled artefacts, calibrated instrumentation and highly trained staff. Second, if you want government to fund it, you have to keep making the case that metrology is not just a ‘nice to have’ overhead, it is the glue that makes international trade possible.
In the case of physical trade, as in the sale of goods for cash, we are looking at a simple exchange mechanism. The seller of the goods adheres to a unit of measure, say mass in kilograms (kg), and the buyer agrees a price per that unit ($ per kg).
When the goods arrive at the buyer they can – all other attributes like quality, colour, and dimensions being equal – weigh 1 the goods and convert that number of kg into the dollar amount to be paid.
As long as both parties agree on the measuring regime for the weight, then the exchange is fair. You could achieve that by the buyer and seller agreeing a particular set of weighing scales to use in advance and then shipping the weighing scales with the goods. The seller uses the weighing scales at the time of packing, the scales travel with the goods, and then the buyer uses them again at the time of accepting the goods.
Or they could agree to a method of interoperability. This is when everyone measures the same thing, in the same way, and reports it so that results can be reproduced across borders. If they can’t agree on a method of interoperability, and they don’t want to ship weighing scales with the goods, each jurisdiction (or each seller) defines its own metrics, its own test conditions, and its own pass/ fail criteria. You could still buy goods in this fragmented way, but you couldn’t buy them with confidence, and you couldn’t benchmark goods against other goods (or sellers) reliably.
The moment you decide not to ship the weighing scales with the goods and instead you choose the interoperability route, you have created a new problem. How do you ensure the seller’s kilogram is the same as the buyer’s kilogram?
This is where standards come into play.
A standard is a portable measurement recipe: it defines what you are measuring, how you measure it, the acceptable tolerances for measuring, and the reporting format so that someone else can reproduce the results. But the standard on its own is still just a recipe. Interoperability only becomes real when the recipe is backed up by testing.
In the physical world of secure documents, the mindset of testing and standards is mature. You can point to well-established frameworks for machine-readable documents and chip interfaces and expect that when a component or device ‘passes the test’ it will mean roughly the same thing across borders. That is why a government agency can issue an ID card, passport or banknote with confidence, in the knowledge that a document inspected elsewhere will behave predictably.
And this is where measuring biological characteristics becomes awkward, because biometrics don’t behave like weighing scales. They behave like statistical instruments. Their error rates move around with population, sensor type, capture conditions and operating thresholds.
In my earlier editorial ‘Why is Biometric Testing So Difficult?’ (IDN July 2025), I argued that biometric systems rarely ‘fail’ deterministically; they fail probabilistically, and their performance in testing depends on how the test is designed and interpreted. That is precisely why standards, and testing against those standards (independently, repeatably and with comparable reporting) is so important.
A new policy brief led by Europol under the EU Innovation Hub for Internal Security highlights that Europe currently lacks a strong, unified system to evaluate how well biometric technologies perform, relying instead on results from programmes run by the US National Institute for Standards and Technology (NIST) 2.
Public authorities across the globe increasingly treat NIST’s Face Recognition Vendor Test (FRVT) and fingerprint evaluations as de-facto benchmarks for border management. However, the Europol brief notes that these testing regimes are not tailored to European operational environments or legal constraints.
The report emphasises the need for repeatable, standardised test protocols, such as ISO/IEC 19795 for reporting biometric accuracy and ISO/IEC 30107-3 for assessing presentation-attack detection, so that results can be reproduced and compared over time. It also stresses that meaningful evaluation depends on well-curated reference datasets whose provenance is documented, including demographic composition, sensor classes and capture conditions that reflect real operational environments.
In the policy briefing, Europol expresses concern that Europe, and the rest of the world, runs the risks of becoming over-dependent on a single measurement lab. It looks at several options for setting up an additional centre of excellence to NIST, to reduce the risk of over-dependence.
The most likely initial step would be to coordinate existing national labs and competent authorities into a harmonised test network, then add a central capability able to host EU-relevant datasets and run EU-wide test campaigns under common protocols.
Biometrics are now embedded in critical identity systems. If we want standards-based interoperability rather than a fragmented market of unverifiable manufacturer claims about their products’ performance, we have to treat independent measurement of biometrics as part of national infrastructure so that they are aligned to international standards.
1 - Apologies for mixing up the terms weight and mass. The difference between force and mass has fallen into popular use.
2 - www.europol.europa.eu/cms/sites/default/files/documents/policy-brief-biometric-evaluation.pdf