The Hardest Part of My AI Project: Making Fake Photos Look Bad on Purpose

11 min read
AIimage generationart projecthow to spot AI images

Every AI image model wants to make you a gorgeous photo. Soft lighting. Creamy bokeh. Skin like it was airbrushed by a team of professionals. The models are trained on aspirational images — stock photos, portraits, editorial work — so that's what they produce by default.

That's a problem when your whole project requires the opposite.

FaceTwin generates face variants of uploaded photos and presents them as surveillance data — an experience designed to make people feel watched. For it to land, the generated faces can't look like AI portraits. They need to look like they came from a company picnic someone posted to Facebook in 2013. Security camera grabs. A DMV photo. A screenshot of a screenshot.

The technical challenge turned out to be one of the most interesting parts of the project: how do you make AI produce bad photos on purpose?

Why AI Images Look Like AI Images

People talk about "the AI look" like it's one thing. It's not. It's a cluster of tells that accumulate because of how these models are trained and what their loss functions reward.

When a model learns to generate a face, it's rewarded for producing images that look like the training data — and training data skews heavily toward professional and semi-professional photography. The result is a default output that's implicitly "good": warm, balanced lighting; skin that reads as healthy and smooth; sharp subject with the background blurred just enough to feel cinematic.

None of that is how most photos of actual people actually look.

Real photos of real people — the ones accumulating on phones and in photo albums and in corporate databases — are lit by overhead fluorescents and harsh camera flashes. They're shot on wide-angle lenses at close range. The subject is in focus and so is the wall behind them. The skin has pores. The color is slightly wrong.

Here are the eight things AI consistently gets wrong, and what it took to fix them.

The Eight Tells (And How to Beat Them)

1. The lighting is too flattering.

AI defaults to something close to "Rembrandt lighting" — soft, directional, professional. Real photos from real life are lit by whatever happened to be on: fluorescent office overheads that flatten everything green, a single lamp in the corner of a living room, or the brutal pop of a built-in phone flash that blows out the center of the face and leaves the edges in shadow.

Prompting for "fluorescent lighting" helps. Prompting for "harsh flash photography" helps more. But you can't just add the words — you have to understand what those lighting conditions actually produce and describe the effects: sharp shadows under the nose and chin, slight overexposure on the forehead, color temperature pushing yellow or green.

2. The background is in focus.

Bokeh — that creamy out-of-focus background blur — is one of the single biggest tells for AI-generated portraits. Real people don't get photographed by someone who knows how to use aperture settings. Real photos are shot on phone cameras in auto mode or on cheap point-and-shoots with kit lenses stopped all the way down.

Everything is in focus. The subject and the beige wall behind them and the light switch and the framed photo of someone's dog. Everything.

Getting AI to abandon bokeh entirely requires active suppression. "Sharp background," "deep depth of field," "everything in focus" — you have to keep saying it because the model keeps wanting to add that blur back in. It's almost a reflex.

3. The skin is too smooth.

AI skin is poreless. It's the skin of a painting pretending to be a photograph. Real skin has pores, texture, occasional blemishes. It catches light differently across different parts of the face.

"Visible pores," "skin texture," "natural blemishes" can pull the model toward something more real. The difficulty is that the model was also trained on photos of real skin — it knows what real skin looks like — but it's been rewarded so heavily for smooth results that you have to fight against that reward signal explicitly.

4. Compression artifacts are absent.

A photo that's lived on the internet for any amount of time has been JPEG'd. Usually multiple times. Posted to Facebook, downloaded, sent via WhatsApp, screenshot, re-uploaded. Each pass through lossy compression adds artifacts: blocky color banding around high-contrast edges, soft smearing in areas of gradual tone, a slight overall quality degradation that's hard to name but instantly recognizable as "internet photo."

AI images are pristine. They've been through zero compression cycles. They have a crispness that real internet photos almost never have.

This one can't really be solved in the prompt. It has to be solved in post-processing — deliberately re-compressing the output at low quality settings, which leads to the degradation pipeline I'll describe below.

5. The resolution is too high.

Stock photos and professional photography are 2000px wide. Real internet photos of real people from 2010 to 2020 are 600 to 750 pixels wide, maybe. Photos from government databases, HR systems, old social profiles — these are small files.

AI models generate at whatever resolution you set them to, and the default is usually high. High resolution reads as professional. Downscaling to something that looks like it was uploaded on a 3G connection in 2014 is part of making it feel real.

6. The composition is casual.

AI portraits center the subject. The face is in the middle of the frame. The head has appropriate headroom. It's composed like someone who cared composed it.

Real candids are off-center. Sometimes dramatically. The subject is looking slightly away from the camera. The crop cuts off the top of the head. Someone walked into frame at the edge.

Prompting for "casual snapshot composition," "slightly off-center," "candid framing" nudges things in the right direction, but AI has a strong pull toward centered subjects. You get maybe 60% of the way there in the prompt and have to accept some variance.

7. The setting is too generic or too nice.

AI backgrounds, when they're not blurred out entirely, default to either clean neutrals or vaguely pleasant environments. Real photos happen in car interiors, fluorescent-lit break rooms, beige hallways, garages, TJ Maxx. Mundane in specific ways that AI doesn't spontaneously produce.

Describing specific mundane environments — "car interior with headrest visible," "fluorescent office break room," "linoleum floor hallway" — produces better results than generic words like "realistic background."

8. The color is slightly off.

Real photos have color casts. Incandescent bulbs push everything warm and yellow. Fluorescents push everything slightly green. Phone cameras try to correct for this but don't always succeed, especially in mixed lighting. Old photos that have been screen-captured multiple times pick up whatever color profile shifts happened along the way.

AI images have clean, balanced color. Corrected color. It looks right in a way that real photos don't always look right.

Color correction in post helps — shifting the hue slightly warm or green depending on the simulated light source, pulling down saturation a bit — but getting the model to produce a color cast in the first place is genuinely difficult.

The Degradation Pipeline

Prompting gets you partway there. The rest is post-processing.

After generation, each image goes through a deliberate degradation sequence: downscale to roughly 650px wide, apply a slight color shift (warm or cool depending on the target lighting), JPEG compress at 65-70% quality, then apply a second JPEG compression pass at 75-80% quality. The double-compression is what produces the artifacts that read as "this photo has lived on the internet."

Some images get a mild grain overlay before the first compression pass, which interacts with the JPEG artifacts in a way that adds another layer of authenticity.

The results aren't perfect. But they're good enough that people don't immediately clock them as AI.

The Prompt Bloat Death Spiral

Here's something that took too long to learn: more instructions make the output worse, not better.

The instinct when prompting is to add more detail when something isn't working. The skin is too smooth? Add "visible pores, skin texture, natural blemishes." The background is blurred? Add "sharp background, deep depth of field, everything in focus." The lighting is too flattering? Add five adjectives about fluorescents.

Each instruction individually moves things in the right direction. But past a certain point — call it 150 words — the model starts to fail in a different way. It becomes incoherent. You get weird artifacts, strange color decisions, faces that look slightly off in ways that have nothing to do with realism. The model is trying to satisfy too many constraints simultaneously and it starts to crack.

The answer is counterintuitive: shorter prompts with fewer, higher-priority instructions. Pick the two or three things that matter most for this particular output. Leave the rest to post-processing.

For this project, the two non-negotiable prompt elements are "no bokeh, sharp background" and specific lighting conditions. Everything else gets handled in the degradation pipeline.

The Inverse Guide: How to Spot AI Images

Everything above is a tell. If you're trying to identify AI-generated photos rather than create them, these are what to look for:

  • Bokeh in a context where a regular person wouldn't have a portrait lens
  • Skin that has no texture, no pores, no variation
  • Lighting that's too balanced, too soft, too professional for the setting
  • Images that are crisp and artifact-free in a way that real photos from real sources aren't
  • Backgrounds that are either perfectly blurred or generically pleasant rather than specifically mundane
  • Centered, well-composed framing that reads as intentional

None of these are definitive on their own. But when you see three or four of them together, you're almost certainly looking at AI. The goal with this project was to reduce the number of those tells to one or two at most — the minimum required to pass casual inspection.

Whether it works is something you can test at pleasejuststop.org.


If the surveillance angle interests you, I wrote about why this project exists in I Built an Experiment to Test If We've Given Up on Privacy and the broader panopticon effect that makes it feel unremarkable. The artists vs. surveillance post covers the longer history of creative work in this space.


FAQ

Does this mean AI images are getting harder to detect?

Yes and no. The base models are getting better at producing realistic images, which makes some tells less obvious. But the training data bias toward professional photography is structural — it won't disappear just because the models get larger. The distribution of what models are trained on shapes what they produce by default, and that distribution skews heavily toward aspirational imagery. Detecting AI images is shifting toward catching patterns rather than spotting obvious artifacts.

Couldn't you just use real photos?

Not for a project about AI and surveillance — the whole point is that the faces are AI-generated variants of the uploaded face, not pictures of real people. Using real photos would create obvious ethical and legal problems, and it would also undermine the premise. The project works because it demonstrates what AI can do with a single face photo.

Why does double JPEG compression work better than single compression?

Single JPEG compression at a low quality setting produces artifacts, but they have a characteristic pattern that can read as "compressed photo" in a slightly artificial way. Photos that have actually been through multiple upload-download-reupload cycles have a more complex artifact pattern because each compression pass works on the artifacts introduced by the previous pass, not the original image data. The result is less uniform and reads more like something that's actually been through the internet.

Is the goal to fool facial recognition systems or just humans?

Humans. Facial recognition systems operate on different features than humans use to evaluate photo authenticity. A photo could look obviously fake to a human and still match a face in a recognition database — or look completely real and fail to match. The project is about human perception, specifically the perception that surveillance is both inevitable and credible.

How long does this take to get right for a new face?

The degradation pipeline is automated, so that part is fast. Prompt tuning for a new input type — different lighting conditions, different demographic — takes a few hours of iteration to get to the point where outputs are consistently passing. The main variable is how much the source photo diverges from what the model has been tuned for.