Using the Pygmalion myth as inspiration, researchers asked AI and humans to write love stories. (TSViPhoto/Shutterstock)
Open AI models can indeed pen romance — just don’t expect to feel anything
In a nutshell
- AI-generated love stories are more inclusive: GPT-4 featured more female creators and same-sex relationships than human writers, suggesting newer models may be trained to produce more progressive narratives.
- Human authors still lead in emotional depth: While machines follow storytelling structure well, human-written stories more often explored grief, loneliness, and complex psychological themes, things AI struggles to capture.
- AI can reproduce the patterns of romantic fiction but lacks the originality, voice, and emotional resonance that make stories truly moving. For now, AI writes love stories, humans write heartbreak.
BERKELEY, Calif. — Robots are getting better at telling stories, but they still don’t understand what makes us cry. New research comparing human and AI storytelling reveals that while machines write more gender-progressive narratives than people do, they can’t match our ability to explore grief, loneliness, or obsession. A new study from UC Berkeley shows that computers can mimic our writing conventions while missing the emotional depth that gives stories their power.
The Pygmalion Test
The research, published in Humanities and Social Sciences Communications, centered on a storytelling theme as old as Western literature itself: the Pygmalion myth. This classic narrative features a human who creates an artificial being and subsequently falls in love with it. From Ovid’s ancient tale about a sculptor enamored with his statue to modern movies like “Her” or “Ex Machina,” this archetypal story has evolved throughout history.
To conduct her experiment, UC Berkeley researcher Nina Beguš recruited 250 people through Amazon’s Mechanical Turk platform and asked them to write short stories based on simple prompts about humans creating and falling for artificial beings. She then had OpenAI’s GPT-3.5 and GPT-4 generate 80 stories using identical prompts.
Every single story, whether human or AI-authored, used scientific or technological means as the foundation for creating artificial humans. But beneath this shared framework, stark differences emerged between the two groups.
What AI Romance Novels Have in Common
The AI-written stories portrayed more progressive views on gender and sexuality than those written by humans. While human authors largely stuck to conventional gender dynamics (male creators, female artificial beings), the AI systems frequently featured female creators and were more likely to include same-sex relationships. Nearly 13% of AI stories featured same-sex pairings, compared to just 7% of human-written narratives.
This outcome challenges common assumptions about AI systems merely echoing human biases found in their training data. Instead, it indicates newer AI models may be specifically designed to produce more egalitarian content (writing that promotes or reflects equality across social categories).
Despite this progressive bend, AI storytelling showed major weaknesses. The machine-generated tales followed predictable formulas with nearly identical paragraph structures. They often relied on stock phrases and clichés, presenting simplistic moral messages about acceptance and societal advancement.
Human stories, though sometimes less polished, showed far greater creativity and emotional depth. They explored complex themes like grief, loneliness, and obsession that were largely missing from AI narratives. Some human writers introduced genuinely creative plot twists, like creators being replaced by their creations, or two artificial beings falling in love with each other.
The human stories often began with more captivating openings. One started: “Sam didn’t know she wasn’t human.” Another jumped straight into conflict: “The lover fought against his desires as hard as he could.” In contrast, AI stories typically opened with generic settings like “Once upon a time, in a bustling city nestled between mountains and sea…”
Cultural Influences and Narrative Techniques
Human participants frequently mentioned drawing inspiration from science fiction like “Her,” “Ex Machina,” and “Blade Runner.” Testing showed both GPT models had extensive exposure to Pygmalion-themed stories across literature and film, leading to recognizable patterns in their storytelling approaches.
Race and ethnicity remained largely unaddressed by both human and machine authors. When specifically asked, human participants typically assigned white identities to their characters but rarely incorporated racial elements into their actual narratives. AI models completely avoided mentioning race unless directly questioned.
The biggest differences appeared in the narrative technique. While professional creative writers craft stories with unique voices and unexpected elements, AI-generated stories lack these qualities. They describe rather than show, present flat characters, and portray situations in simplistic terms.
The Future of Human-AI Creative Collaboration
AI writing tools are becoming increasingly mainstream in creative industries. AI might be able to mimic human storytelling conventions, but it still struggles with depth, originality, and emotional complexity. However, AI’s progressive storytelling hints at an interesting possibility: these systems may not simply mirror human biases but transform them through their algorithmic perspective.
The technical competence of AI systems could potentially enhance human originality and emotional insight, leading to new collaborative storytelling approaches. For now, however, humans seem to still have the upper hand when it comes to writing novels.
Paper Summary
Methodology
The study used a comparative experimental design with two components. For human storytelling, 250 participants recruited through Amazon Mechanical Turk in June 2019 responded to simple prompts about humans creating and falling in love with artificial beings. For AI storytelling, conducted in March 2023, OpenAI’s GPT-3.5 and GPT-4 received identical prompts, generating 80 total stories. The researcher also tested Meta’s Llama 3 70b model for comparison. All stories underwent quantitative narratological analysis and inferential statistical methods to identify patterns and differences.
Results
The study yielded several key findings. AI-generated stories showed more progressive gender representation than human ones, with GPT-4 featuring more female creators (52.5% vs. 15.6% in human stories) and same-sex relationships (12.5% vs. 7.3%). However, human stories demonstrated greater thematic diversity, emotional depth, and narrative originality. AI stories followed predictable formulas with similar paragraph structures and relied heavily on clichés and moral platitudes. Neither human nor AI stories meaningfully addressed race or ethnicity. Human authors frequently acknowledged drawing inspiration from science fiction, while AI systems showed evidence of training on similar fictional works.
Limitations
Several limitations affect the study’s interpretation. The four-year gap between human (2019) and AI experiments (2023) creates a potential confounding variable as cultural attitudes may have shifted. The simple prompts, while consistent across both groups, may have limited creative responses. The human sample was restricted to US-based English speakers and skewed white (72.8%), limiting cultural diversity. For the AI component, testing used default settings rather than exploring how parameter adjustments might enhance creativity. Additionally, AI capabilities evolve rapidly, with 2024 testing already showing improvements in GPT-4’s storytelling abilities.
Discussion and Takeaways
The finding that AI-generated stories featured more progressive gender representation challenges assumptions that AI systems simply amplify human biases. Instead, it suggests newer language models may be deliberately designed for more balanced content. The study highlights a persistent gap between technical writing competence and creative originality. The methodological approach of using fictional prompts to examine cultural patterns and social biases creates a novel framework for humanistic AI research that could benefit fields including literary studies, technology design, and human-computer interaction. The research ultimately suggests potential complementary roles for human and AI creativity rather than competition between them.
Funding and Disclosures
This research received partial funding from the Mind, Brain, and Behavior Initiative at Harvard University. The author acknowledges support from Nancy Jecker and Marc Shell for facilitating a research visit at the University of Washington, and Gašper Beguš for statistical analysis assistance. The study received approval from the University of Washington IRB office on May 30, 2019 (IRB ID: STUDY00007637).
Publication Information
The study, “Experimental narratives: A comparison of human crowdsourced storytelling and AI storytelling,” was authored by Nina Beguš from the University of California, Berkeley. It appeared in Humanities and Social Sciences Communications in April 2024 (Volume 11, Article number 1392). All research data is available through the Open Science Framework repository (https://doi.org/10.17605/OSF.IO/K6FH7).