How Facial Recognition Fails the People It Targets Most

February 16, 20269 min read

facial recognitionbiascivil rightssurveillance

In 2018, MIT researcher Joy Buolamwini published a paper that should have stopped an entire industry in its tracks.

She tested three commercial facial recognition systems from IBM, Microsoft, and Face++. For light-skinned men, the error rate was 0.8%. For dark-skinned women, the error rate was 34.7%.

That's not a rounding error. That's a system that gets it wrong one in three times — for one specific group of people. The same systems that barely blink on a white man's face completely fail on a Black woman's face.

The paper was published. The companies acknowledged it. The industry noted it.

And then the systems got deployed anyway.

The Gender Shades Study: The Numbers That Were Ignored

Buolamwini's research, conducted with AI researcher Timnit Gebru, used the dermatologist-approved Fitzpatrick Skin Type scale to categorize subjects by skin tone. The methodology was rigorous. The findings were not ambiguous.

The gap wasn't just about skin tone in isolation, or gender in isolation. It was the intersection — dark-skinned women — that exposed how badly these systems fail. This is the crux of facial recognition bias: it's not a single demographic disadvantage, it compounds.

The reason isn't mysterious. These systems are trained on datasets. Those datasets were overwhelmingly composed of lighter-skinned subjects — 79.6% for one major benchmark, 86.2% for another. You train a model on data that skews white and male, you get a model that works well on white and male faces.

What makes Buolamwini's own story significant is that she experienced this personally before she proved it academically. Facial recognition systems literally could not detect her face. She had to wear a white mask for the camera to register her. A researcher — in 2018, at MIT — had to wear a white mask to be seen by a computer. That's not a metaphor. That happened.

The companies patched their systems after the paper dropped. The underlying problem — biased training data producing biased systems — did not go away.

Three Real People. Three Wrongful Arrests. All Black.

The error rates aren't just statistics. They translate directly into police showing up at people's doors.

Robert Williams, a Detroit man, was arrested in January 2020 in front of his wife and two young daughters. Police had a blurry surveillance still from a store robbery. They ran it through facial recognition. The algorithm returned his expired driver's license photo as a possible match. Officers arrested him and detained him for thirty hours.

At one point during interrogation, a detective showed Williams a photo spread and asked if he recognized anyone. Williams pointed to the match photo and said: "That's not me." The detective disagreed. Williams was pointing at his own face. His point was that the photo didn't look like him. He was right. He was eventually released. No charges.

Nijeer Parks was arrested in New Jersey in 2019 after facial recognition matched him to a shoplifting suspect. DNA and fingerprint evidence collected at the scene pointed to someone else. Parks was arrested anyway. He spent ten days in jail and spent roughly $5,000 fighting the charges before they were dropped.

Porcha Woodruff was arrested in Detroit in January 2023. She was eight months pregnant. The charge was robbery and carjacking. The facial recognition match was wrong. Not only was Woodruff innocent — the actual suspect was visibly not pregnant. Woodruff spent eleven hours in custody before being released.

Every one of these cases: Black. Every one of these cases: facial recognition match used as primary basis for arrest. Every one of these cases: wrong.

These aren't isolated bugs. They're the predictable output of deploying systems with documented accuracy disparities on populations that are already over-policed.

The Government Confirmed It. Nothing Changed.

If the Gender Shades paper wasn't enough, the U.S. government ran its own study.

The National Institute of Standards and Technology (NIST) evaluated 189 facial recognition algorithms from companies around the world. The results, published in 2019, confirmed what Buolamwini had found — and went further.

For one-to-one matching (verifying that a photo matches a specific person), false positive rates for Asian and African American faces were often 10 to 100 times higher than for white faces, depending on the algorithm. For one-to-many matching (searching a database to find a potential match — the exact use case in law enforcement), African American women had the highest false positive rates of any group.

American Indian faces had some of the highest false positive rates of any demographic in mugshot-based searches.

NIST didn't find one or two bad actors. This was systemic, across algorithms, across vendors. The geographic exception was telling: algorithms developed in Asian countries performed significantly better on East Asian faces — because their training data reflected the demographics of their developers. The bias follows the data.

The NIST study landed. Law enforcement agencies cited it in policy discussions. And then kept deploying these systems.

The Feedback Loop Nobody Wants to Talk About

Here's what makes facial recognition bias particularly durable: it feeds itself.

The systems are built on biased training data. Biased systems generate more false matches in communities of color. Those false matches result in more law enforcement contact with those communities. More law enforcement contact generates more mugshots, more surveillance footage, more facial data from those communities — which gets fed back into training datasets, reinforcing the existing patterns.

Meanwhile, the populations with the lowest false match rates — white men — generate fewer wrongful contacts, less friction, less data from enforcement contexts.

The bias isn't just baked into the algorithm. It's baked into the data pipeline. And the deployment pipeline. And the enforcement decisions made downstream.

More than a quarter of local and state police forces currently use facial recognition. Half of federal law enforcement agencies use it. In Detroit in 2020, all 129 facial recognition searches conducted by police were run on images of Black people. Not some. All.

There is still no federal law in the United States that directly regulates the use of facial recognition technology.

The companies that build these systems are not required to disclose accuracy rates by demographic. The law enforcement agencies that use them are largely not required to track how often matches are wrong. The people who get wrongfully arrested because of a bad match have to hire lawyers and fight it themselves.

The accountability structure here is: none.

Why This Connects to What We Built

FaceTwin is a digital art project. You paste a friend's publicly available photo, the AI generates three altered face variants, and you send your friend a link to "FaceTrace" — a fake AI product that supposedly found real strangers who look like them. The reveal shows them exactly what happened: their own face, modified, passed back to them as "matches."

The premise of the project is the premise of surveillance capitalism: the technology is being trusted with real consequences when it can't get basic accuracy right.

The thing is, people believe FaceTrace is real. Not everyone — but enough. The experience is convincing enough that real people see the "matches" and feel that cold recognition of: oh, this is possible now.

That credulity is the actual problem. We've collectively decided to believe that facial recognition works well enough to act on. To arrest people on. To deny people housing or jobs on. The technology is imperfect by design, unreliable by demographic, and deployed everywhere by default — and we keep giving it authority it hasn't earned.

The project asks one question: have we already given up on privacy?

The facial recognition accuracy data suggests we've given up on something else too: the expectation that surveillance technology has to actually work before it's used to ruin someone's life.

If you want to think about what it means to have a face you can't change in a world that's building permanent records around it — or why there's still no federal facial recognition law despite years of documented harm — those posts are worth reading.

Frequently Asked Questions

What was the Gender Shades study and why does it matter?

Gender Shades was a 2018 MIT study by Joy Buolamwini and Timnit Gebru that tested commercial facial recognition systems from IBM, Microsoft, and Face++. It found error rates as low as 0.8% for light-skinned men and as high as 34.7% for dark-skinned women. It was the first large-scale study to document these disparities using a rigorous skin tone classification system. It matters because these commercial systems were already deployed at scale when the paper published — and mostly stayed deployed afterward.

Has anything changed since these studies came out?

IBM and Microsoft updated their systems after Gender Shades was published, and some improved their scores on the original benchmark. But the underlying problem — that training data doesn't reflect demographic diversity — persists across the industry. NIST's 2019 study, covering 189 algorithms, found the same disparities across the field. No federal regulation has passed. Law enforcement adoption has expanded, not contracted.

Are there facial recognition systems that don't have these biases?

NIST found that algorithms developed in Asian countries performed significantly better on East Asian faces — sometimes better than on white faces — because the training data reflected those developers' demographics. This suggests the bias is not inevitable but is a direct product of who builds these systems and what data they use. A more representative dataset would produce a more accurate system. The industry has been slow to prioritize this.

What should I know about wrongful facial recognition arrests?

At least seven confirmed wrongful arrests in the United States are linked to facial recognition misidentification. All six individuals whose cases have been publicly documented are Black. The pattern: surveillance footage gets run through a facial recognition system, returns a possible match, and police treat that probabilistic match as sufficient basis for arrest — without requiring additional corroborating evidence first. The ACLU and other civil liberties organizations have been documenting these cases and pushing for regulation.

What can I actually do about this?

Push for legislation. Several cities — San Francisco, Boston, Portland — have banned government use of facial recognition technology. Similar bans have been proposed at the federal level but haven't passed. Contact your representatives. Support organizations like the ACLU and the Algorithmic Justice League (founded by Joy Buolamwini). And be skeptical of any system — employer screening tool, law enforcement database, housing platform — that claims to use "AI matching" without disclosing accuracy rates by demographic. Ask the question. Make them answer it.