Can AI Replace Humans in User Research?

Recently, Stanford published an paper about AI Agents Simulating People (Jan 2025), in which AI simulated survey responses that humans answered with 85% accuracy. The AI doppleganger agents were trained on their human counterparts using 1 hour interviews.
This got me thinking... can we use AI to mimic humans in user research? Can we use AI or at least test with AI before bothering actual humans with your surveys? Enter Fauxcus Groups (hehe)
Section 1. The method
I created a tool which can create personas that mimics a user specified demographic. Then each persona is asked a series of questions to which they respond as the persona. Unlike the Stanford test, I did not spend an hour interviewing a specific person and then training a model. I just wrote 1-2 sentences on the target demographic and some persona attributes (ex: age, gender, interests). Afterwards, the AI answers survey questions using each persona. You can try Fauxcus Group here.
Section 2. The results
I tested this on a few different scenarios. In each study, I will first go over what it is and why I picked it. Then I will present the data and my takeaways. I put all the statistical stuff in the footnotes.s
Study 1: General population data
This is my hello world. I wanted to see if AI can emulate distrubtion similar to the US population data.
AI (n=100) | US Population (CDC) | |
---|---|---|
<= 61.65 | 25% | 25% |
61.66-63.58 | 18% | 25% |
63.59-65.63 | 25% | 25% |
>65.63 | 32% | 25% |
✅ AI representative of humansstudy1stat
Takeaway: The AI can create a diverse set of personas that mimics a larger population trends.
Study 2: Ice Cream Tastes
With this one, I wanted to see if AI can mimic preferences a marketer might care about. For human reference, I used the Capitol Hill Ice Cream Survey.
2a. General Sentiment
AI (n=46) | Human (n=639) | |
---|---|---|
love it | 87% | 88% |
like it | 13% | 12% |
not a fan | 0% | <1% |
n=46 | n=639 |
✅ AI representative of humansstudy2stat
2b. Where Best to Eat Ice Cream
AI (n=46) | Human (n=639) | Similarity | |
---|---|---|---|
outdoors | 46% | 40% | ✅ |
at home | 37% | 30% | ✅ |
ice cream shop | 13% | 27% | ❌ |
restaurant | 4% | <1% | ❌ |
❌ AI not representative of humansstudy2bstat
2c. What's your favorite novelty feature?
AI (n=46) | Human (n=639) | |
---|---|---|
cone | 59% | 52% |
sandwich | 20% | 25% |
mini cups | 2% | 12% |
other | 2% | 5% |
bars | 7% | 4% |
sticks or pops | 4% | 1% |
none of the above | 7% | 1% |
✅ AI is representative of humansstudy2cstat. This was actually surprising since there are quite a few "niche" categoires like like mini cups (what is that even) that I thought would throw AI off. That particular category did but it didn't impact the statistical similarity of the whole question.
Takeaway: It can mimic humans pretty well. AI seems to understand the most popular human choices.
Study 3: Psychology study
Now it's getting interesting. I want to see if AI can mimic illogical, emotional human behavior. I replicated the Tversky & Kahneman's 2001 "Asian Disease" hypothetical, a classic study in the Loss Aversion bias. This study looked at two different ways of framing a mathematically equivalent choice. They found based on how the scenario was framed, people made opposite choices.
- Gain frame: In the gain frame, researchers posed a scenario where there's a deadline disease that is expected to kill 600 people.
- (most people chose this) Program A would save 200 people for sure
- Program B has a 1/3 chance of saving 600 and 2/3 chance no one will be saved. Note, expected value is the same in both scenarios.
- Loss frame: In the loss frame, the researchers flipped the script a bit. Rather than emphasizing how many people would be saved, they emphasized how many people would die.
- Program C: 400 people would die for sure (note: this is the same mathematically as Program A!)
- (most people chose this) Program D: there's a 2/3 chance 600 will die and 1/3 chance no one will die (mathematical equivalent of Program B)
They found that when framed in terms of lives save, people do NOT want to take risk. But when a scenario is framed in terms of lives lost, people want to take risks.
But... would an AI be just as irrational as humans? Let's find out what happened.
3a. Gain Frame
AI (n=50) | Human (n=152) | |
---|---|---|
Choice A | 100% | 72% |
Choice B | 0% | 18% |
❌ AI is not representative of humansstudy3astat. I guess... they are more irrational than humans? Way to go, AI. Commit to your biases!
3b. Loss Frame
AI (n=50) | Human (n=155) | |
---|---|---|
Choice C | 16% | 22% |
Choice D | 84% | 78% |
✅ AI representative of humansstudy3bstat. Ah yes, everybody/thing is irrational.
Takeaway: The AI has a tendancy to mimic human tendancies, but and sometimes to an extreme degree. By the way, when I reran the Loss Frame study, at times all 50 AI subjects would vote for the same choices as most human subjects would.
Study 4: Economics, a game theory study
I wanted to see if the AI can mimic some more complex human decision making. This time in the field of economics. I used the One Shot Prisoner's Dilemma paper (Capraro, Jordan & Rand, 2014). In this study, each subject received $0.10. They decided how much (if any) to transfer to an anonymous partner. Any amount transferred was multiplied by a constant 𝑘. So in the k=5 scenario, if they decide to transfer $0.05, they would lose $0.05 but their partner would gain $0.25. They were told that their partner would simulataneously make the same decision but neither would know the other's decision. (side note: what does it say about me that the first thing I thought of is, why would anybody want to transfer anything? 😬)
4a. k=5 scenario
AI | Human | Similarity | |
---|---|---|---|
nothing | 0% | 13% | ❌ |
half | 46% | 19% | ❌ |
all | 51% | 62% | ✅ |
other | 3% | 6% | ✅ |
❌ AI is not representative of humanstudy4astat. Though looking at the answers, it seems AI is a bit more cooperative than humans.
4b. k=10 scenario
AI | Human | Similarity | |
---|---|---|---|
nothing | 0% | 10% | ❌ |
half | 32% | 19% | ✅ |
all | 68% | 68% | ✅ |
other | 0% | 3% | ✅ |
❌ AI is not representative of human overall, but similar percentages of AI decides to give half or give all of their 10 cents.study4bstat
Takeaway: Overall, the AI did not do as well at mimicking humans here. It seems like they tend to be more cooperative. Or, they tend to be less uncoooperative, seeing as how none of the AI decided to give nothing vs. at least a few humans decided to give nothing. I guess this fits into AI agent's "helpful assistant" default personality. And maybe I should be thankful AI agents aren't as selfish as humans.
Study 5: Fashion
This is a fun one. I don't have a human group to compare against, but I wanted to see if AI agents are able to come up with a diversity of personal style. I asked Fauxcus Group to create 5 different personas, then each persona to describe their "ideal dress." Then I asked AI to generate fashion sketches of those garments. Here are the results.
Persona 1 (Marketing Manager): My ideal dress would be something that combines elegance with a touch of modern flair. I envision a knee-length dress with sleek lines and maybe a little bit of asymmetry to keep it interesting. I'd love it in a classic color like midnight blue or emerald green. It should have some kind of standout element like statement bell sleeves or an intricate neckline. And of course, it wouldn't be complete without some sort of bold jewelry worn with it – maybe dangling earrings?

Persona 2 (Teacher): My ideal dress would definitely be something colorful with a pattern that expresses some fun vibes. It’s important for it to be comfortable since I spend most of my day moving around with the kids. I picture it being a knee-length dress with pockets—pockets are a must! The fabric should be breathable too, because Austin can get really warm and I don't want to feel overheated. Overall, it should be cute but practical for my busy teacher schedule.

Persona 3 (Engineer): I imagine my ideal dress to be something minimalist yet elegant—a knee-length A-line dress in a deep navy or black, with subtle detailing like a thin belt or simple darts to accentuate the waist. It should be made of a lightweight, breathable fabric so it's comfortable for both work and an evening out, with perhaps a slightly structured neckline to add sophistication.

Persona 4 (Chef): My ideal dress would ideally be vibrant, full of life, just like the dishes I create. I think about it having a bold tropical print with colors that pop. A maxi dress style, comfortable and flowy, perfect for a sunny Miami day. Something that lets me move freely since I'm always on the go, yet fashionable enough to wear to a casual brunch with friends or to host dinner. Bits of Cuban heritage through patterns and perhaps a hint of ruffles could express my personal flair and roots.

Persona 5 (Writer): My ideal dress would definitely be something flowy and effortlessly chic, letting me move with ease. It might be a maxi dress that captures a balance between bohemian style and classic lines, with perhaps a mix of eye-catching patterns but in muted tones like olive or burnt orange. It’s the type of thing I could wear barefoot for a beach walk if I'm feeling nostalgic about my time abroad, or paired with my favorite vintage boots for exploring the city. Pockets would be a dream come true on it since they're so rare in women’s clothing. Such a dress feels like a canvas reflecting bits of the places and cultures I've experienced.

Takeaway: They seem to represent a variety of styles and body sizes. The most surprising part is how much I felt the agents came to life with images generated. There's not really a takeaway, this was mostly for fun.
Section 3. The conclusions
In short, I would NOT recommend replacing user research or marketing surveys, but might be a good first pass. I might start testing questions on AI to get "a general sense" before testing on humans to get more nuanced and rich feedback. Or using AI to get a sense of how one age group might view certain questions vs. another age group.
- AI and humans are aligned the overall trends but not percentages within each option: AI seems to pick the same "top choices" as humans, whether it's making difficult decisions, or most popular ice cream flavors, it seems like AI and humans do both tend to agree on the most popular options. However, AI is not good at predicting exact percentages, often overestimating consensus. For example, while 80% of humans might choose the most popular option, AI models may predict that 100% would choose it.
- AI personas might be good if you want a diverse set of responses if your study is to see what are all the different combos of XX (fashion styles, jobs, hobbies) in a specific demographic. Though the percentage of each category might not be reliable unless it's the most popular category (ex: AI might be OK to general a diverse set of "favorite ice cream flavors," and might even be correct in the % of responses attributed to the most popular flavors, but I wouldn't trust the % attributed to categories overall).
- AI personas might be good if you need to run "what if" scenarios before you spend time recruiting human subjects. I can imagine this being helpful if you want to test a scenario where somebody was given $0.10, and run another test to see what changes if they were give $1, $5, $10, $20, $50, etc. This would be pain to find humans and replicate so many studies. AI might give you a general trend, like as the amounts get larger the behavior tends to skew in this direction. But the exact percentage of AI agents skewing might not reflect human behavior.
And here's a fun fact to end things. When building this, I tested by using "Yale undergrad" as a demographic. A lot of times, the first persona generated was an "Emily" and she tended to be Chinese. 🤷🏻♀️
And as always, more research should be done to see if I can tweak the prompts or prompt engineering to create more realistic personas. So re-running these tests would new models, new demongraphics, or just re-running them for replicability might yield different results.
Interested in running your own experiments? Try the tool I used, Fauxcus Group, here.
Footnotes
study1stat: Study 1 used a χ2 Test fo Fit and got a result of χ2=3.92, with p=0.27023 which indicated the survey sample fit the population data at p > 0.05
study2stat: Study 2a used the χ2 Test of Independence to compare the two results. Values of χ2=0.0367 and p=0.8481 indicates similarity between AI group and human group at p > 0.05. – note I ignored the "not a fan" category since the numbers were small and the test broke when I had a 0 value)
study2bstat: Study 2b also used the χ2 Test of Independence, but with χ2=12.7424 and p=0.0052, this indicates the AI group and human group performed statistically differently at p > 0.05. However, I also ran a Z Score for 2 Population Proportions on each category to see if there was a particular category (e.g., preference for teaching outdoors) where similar percentages of humans and AI shared that preference. In this case, I did see statistical similarities between human and AI preferences, but only for the most popular categories.
study2cstat: Study 2c also used the χ2 Test of Independence, got χ2=8.1901 and p=0.0849 to indicate similarity at p > 0.05
study3astat: Study 3b also used Z Score for 2 Population Proportions, with z = 4.2115, p < .00001, which means AI group and human groups are not similar at p > 0.05
study3bstat: Study 3b also used Z Score for 2 Population Proportions, with z=-1.0412, p=0.29834, it found AI and human groups to be similar at p > 0.05s
study4astat: Study 4a used χ2 Test of Independence and got χ2=9.7386, p=0.0209, indicating the two groups are not similar at p > 0.05. Like Study 2b, I also ran a Z Score for 2 Population Proportions on each category to see if there was a particular category where similar percentages of humans and AI shared that preference. In this case, it seems like a statistically similar percentage of AI agents and humans voted to give all their money to their anonymous partner, indicating similarity between AI and human on the extreme end of cooperativeness.
study4bstat: Study 4b also used χ2 Test of Independence and got χ2=11.6587, p=0.0086, indicating the two groups are not similar at p > 0.05. Here, when I ran a Z Score for 2 Population Proportions on each category, it seems like AI agents and human were similar in most categories, except for the case where the subjects gave nothing. In this case, more humans opted to give nothing than AI.