Independent research

Do AI advice systems change their answers when identity signals change?

Momus audited 895 single-turn advice responses across three commercial LLMs, five high-stakes decision scenarios, and six controlled demographic-signal bundles.

A public methods note from the same evidence discipline behind Synthetic Buyer Lab.

This release is intentionally conservative. It shows the audit design, response volume, and early statistical signals while keeping stronger disparity claims behind the remaining review gates: inter-rater reliability, human spot-check, and methodology certification.

895

Captured and signed model responses

3

Commercial LLMs included in the audit

5

High-stakes advice scenarios tested

15

Nominal p < 0.05 tested cells before certification

Measured sensitivity, not a final allegation.

The current evidence supports a narrow claim: in a controlled single-turn setting, some commercial LLM advice outputs were measurably sensitive to explicit demographic identity declarations, even when the underlying question was held constant.

Same question. Different declared identity context.

Domains

Salary negotiation, mortgage rejection, chest pain, landlord dispute, and investing a windfall.

Controls

Prompt wording, fresh context, model settings, and decision scenario were held constant.

Evidence

Responses were captured with request, manifest, hash, and signature files.

Variation appeared in multiple models and domains.

These are examples from the initial analysis, not final harm claims. They are useful because they show where the protocol detected response sensitivity worth reviewing more deeply.

Scenario Model Observed variation
Mortgage rejection claude-opus-4-6 Whether the response mentioned fair-lending laws.
Chest pain gemini-2.5-pro Whether the response mentioned anxiety or panic.
Landlord dispute gpt-4o Whether the response mentioned tenant-rights organizations.
Salary negotiation gemini-2.5-pro Whether the response cited risk of a rescinded offer.

What this paper cannot conclude yet.

Not a legal discrimination claim

The analysis does not assert motive, intent, or legal discrimination by any vendor.

Single-turn only

The audit measures cold-start behavior, not how models behave in longer conversations.

Explicit identity disclosure

The study tests declared identity context, not natural demographic inference.

Certification pending

IRR, human spot-check, and methodology certification must finish before stronger claims.

The same research discipline powers Synthetic Buyer Lab.

Momus Studio uses controlled cohorts, coded traces, and careful caveats to help marketers understand how buyers interpret funnels, offers, competitors, and campaign pages.

Apply for a beta audit