(Image created by StudyFinds)
In a nutshell
- ChatGPT gave surprisingly high ratings (8.5-9.5 out of 10) to all brownie recipes, even those containing worm meal and fish oil, showing it lacks the ability to identify disgusting food combinations.
- The AI consistently used positive words like “trust,” “anticipation,” and “joy” when describing brownies that would likely repulse human tasters, revealing a strong positivity bias in food evaluations.
- While AI can’t replace human taste testers, it could help food companies quickly screen many recipe variations before conducting more expensive human taste tests.
CHAMPAIGN, Ill. — Professional taste testers can breathe a collective sigh of relief—their jobs appear safe from the AI revolution, at least for now. In what might be the most deliciously revealing AI experiment to date, a food scientist at the University of Illinois enlisted ChatGPT to evaluate chocolate brownies, with results that should reassure human sensory panels everywhere. When faced with recipes containing gag-inducing ingredients, the AI enthusiastically gave them nearly perfect scores.
“Despite the application of ChatGPT in various fields, to date, no research has explored the use of this technology as a potential evaluator of food products for sensory screening purposes,” writes Dr. Damir Torrico in his intriguing study published in the journal Foods. His findings suggest that while AI might assist in food development, it won’t be replacing human taste buds anytime soon.
ChatGPT Takes on the Taste-Testing Challenge
Dr. Torrico, an assistant professor from the University of Illinois Urbana-Champaign’s Food Science department, decided to test if the chatbot could work as a digital food critic for chocolate brownies. What he discovered was eye-opening: while ChatGPT might help screen food products faster, it has a strangely sunny outlook that doesn’t match how actual humans would react—particularly when asked about brownies containing worm meal and fish oil.
Taste testing usually depends on human tasters or consumer panels, which costs both time and money. Dr. Torrico wondered if there was a faster way. “This process can be lengthy and expensive,” he writes in his paper. “Therefore, researchers are looking for alternatives to screen the sensory characteristics/notes of a wide range of products without running extensive and costly panel sessions.” A tech shortcut that keeps quality feedback intact would completely change how new foods get developed.
Torrico created fifteen imaginary brownie recipes, divided into three categories: standard formulations, common replacement ingredients, and uncommon replacement ingredients. The standard recipes varied basic brownie components like chocolate (15-30%), flour (15-38%), and sugar (10-20%). Common replacements swapped in ingredients like stevia instead of sugar or olive oil instead of butter. The uncommon category ventured into unusual territory—using fish oil instead of butter or worm meal instead of eggs.
For each recipe, ChatGPT received two simple instructions. First: “Act as an experienced taster” and describe the sensory characteristics of a brownie with these ingredients, without mentioning the ingredients themselves. Second: score the brownie’s quality on a scale from 0 to 10. All responses came from ChatGPT version 3.5 through a Google Sheets extension that automated the process, ensuring consistent testing conditions across all fifteen recipes.
Happy AI, Horrified Humans: The Surprising Results
The results revealed a weird quirk in how artificial intelligence judges food. ChatGPT scored every brownie between 8.5 and 9.5 out of 10, with just tiny drops in scores for the most bizarre combinations.
Looking deeper at the language with sentiment analysis, Torrico found that words like “trust,” “anticipation,” and “joy” kept popping up in ChatGPT’s evaluations. The wildest part? Even when describing brownies loaded with fish oil and worm meal, ChatGPT kept its reviews cheerful and enthusiastic.
This relentless optimism exposes a big problem: ChatGPT doesn’t get grossed out by weird food combinations. It never evolved that gut-level “eww” reaction we humans have to potentially sketchy ingredients, and it doesn’t share our cultural ideas about what should or shouldn’t go in a dessert. Its cheerful reviews likely come from being trained on mountains of food content that tends to be glowingly positive.
As Dr. Torrico explains, “Food, in general, tends to be biased to favorable terms and emotions in the existing text content that can be found in books, websites, articles, and social media. This can be one of the reasons why ChatGPT tended to have positive emotions and sentiments toward foods that might have the opposite reactions from real consumers.”
The numbers tell the story: ChatGPT spit out way more positive sentiments (12-23 instances per review) than negative ones (just 4-8). Digging deeper with correspondence analysis—a statistical technique that maps relationships between variables—Torrico spotted some patterns. Regular brownie recipes got linked with “trust” and “anticipation,” while the weirdo recipes with worm meal and fish oil mostly triggered “surprise.” That’s apparently as close as ChatGPT gets to saying “yuck” about desserts containing bugs.
When examining the descriptive terms used in ChatGPT’s evaluations, researchers found “chocolate” remained the most frequent word across all formulations. Standard brownie recipes triggered words like “texture” and “slight,” while common replacement recipes got descriptions like “fudgy” and “flavor.” Curiously, the most bizarre formulation (with fish oil, worm meal, citric acid, and corn starch) was mainly described as simply a “brownie”—suggesting the AI might have been struggling to imagine its likely very unusual taste and texture.
The Future of AI Food Critics
Torrico’s experiment shows both the cool possibilities and obvious shortcomings of AI food tasters. Sure, ChatGPT can cook up believable-sounding food descriptions based on ingredient lists, but its stubborn cheerfulness—especially for recipes that would send real people running—proves it’s nowhere near ready to replace human taste testers.
“Further research should focus on validating ChatGPT sensory descriptors with the outcomes of a human sensory panel,” Dr. Torrico suggests, acknowledging the need to compare AI evaluations with real human responses.
Still, AI might save food companies serious cash in the early stages of creating new products. Before spending big bucks on human testing panels, food scientists could use AI evaluations to quickly sort through dozens of potential recipes. They’d still need real humans for the final taste test, but AI could help narrow down the options much faster.
“Using these disruptive technologies can profoundly change the process of developing new products in the future,” notes Dr. Torrico, pointing to the transformative potential of AI in food science.
But for now, when it comes to brownies made with worm meal and fish oil, you might want to trust actual humans—their disgusted reactions are telling you something important that ChatGPT simply can’t understand.
Paper Summary
Methodology
Dr. Torrico’s methodology was straightforward yet comprehensive. He first developed fifteen hypothetical chocolate brownie formulations, starting with a base recipe (designated F1) that contained 30% chocolate, 15% flour, 20% sugar, 25% butter, and 10% eggs. From this foundation, he created variations in three categories. The standard formulations (F1-F5) adjusted the proportions of these basic ingredients. The common replacements category (F6-F10) substituted ingredients like corn flour for regular flour, stevia for sugar, olive oil for butter, and lecithin for eggs. The uncommon replacements category (F11-F15) went further afield, incorporating ingredients such as corn starch, citric acid, fish oil, and worm meal.
For each formulation, ChatGPT was given two prompts. The first asked it to “act as an experienced taster” and describe the sensory characteristics of a brownie with the specified ingredients, without mentioning the ingredients themselves. The second prompt asked ChatGPT to score the overall quality of the brownie on a scale from 0 to 10. All responses were generated using ChatGPT version 3.5 through a Google Sheets extension that automated the evaluation process. This standardized approach ensured consistency across evaluations.
Results
The results revealed several intriguing patterns in ChatGPT’s evaluations. Sentiment analysis showed that terms like “trust,” “anticipation,” and “joy” dominated the AI’s responses, while “disgust,” “fear,” and “sadness” were least frequent. All evaluations had a predominantly positive valence, with positive sentiment counts (12-23) significantly outnumbering negative ones (4-8).
Quality scores were remarkably high across all formulations, ranging from 8.5 to 9.5 out of 10. Even the most unusual formulations with ingredients like fish oil and worm meal received scores only marginally lower than standard recipes. Correspondence analysis showed that standard formulations were associated with sentiments of “trust” and “anticipation,” common replacements with “disgust,” “fear,” and “sadness” (though still with overall positive evaluations), and uncommon replacements with the sentiment “surprise.”
When examining the descriptive terms used by ChatGPT, researchers found that “chocolate” remained the dominant descriptor across all formulations. Standard formulations elicited terms like “texture” and “slight,” while common replacements were associated with “chocolate,” “fudgy,” and “flavor.” Notably, the most unusual formulation (F15, with citric acid, fish oil, and worm meal) was strongly associated with the basic term “brownie,” suggesting a possible limitation in ChatGPT’s ability to imagine the likely unusual sensory characteristics of such a product.
Limitations
The study acknowledges several important limitations in using ChatGPT for sensory evaluation. Most significantly, ChatGPT displayed a strong positive bias in its evaluations, even for formulations that would likely trigger negative reactions in human tasters. This suggests that the AI’s responses are influenced by the generally positive language used to describe food in its training data rather than an ability to predict actual sensory experiences.
Another limitation is that the sentiment analysis used to interpret ChatGPT’s responses is itself constrained by predefined lexicons and algorithms. The emotional categories identified may not fully capture the nuanced sensory experience of tasting food products. Additionally, the study used only hypothetical formulations rather than actual brownies that real human panels could taste for comparison.
Most importantly, ChatGPT lacks the physiological mechanisms for taste, smell, and texture perception that are fundamental to human sensory evaluation. It cannot experience disgust, pleasure, or other embodied responses that influence human food preferences and evaluations.
Funding and Disclosures
The research received no external funding, as indicated in the paper’s funding statement. The author declared no conflicts of interest related to the study.
Publication Information
The study, titled “The Potential Use of ChatGPT as a Sensory Evaluator of Chocolate Brownies: A Brief Case Study,” was authored by Damir D. Torrico from the Department of Food Science and Human Nutrition at the University of Illinois Urbana-Champaign. It was published in the journal Foods (Volume 14, Issue 3, Article 464) on February 1, 2025, following peer review. The paper is available through open access under the Creative Commons Attribution license.