Advice about how to speak to AI systems has become increasingly creative. Some people insist on politeness, while others recommend flattery or even threats. One research team even discovered that instructing a model to imagine itself in a Star Trek scenario improved math performance under certain conditions.
These findings have fueled curiosity about whether wording alone can meaningfully change outcomes.
Research and expert interviews suggest that small changes in phrasing can influence results, yet there is no universal formula that works across systems.
In reality, model size, training data, task type, and reasoning method all influence performance. Experts argue that clear and well-defined goals, along with structured interactions, remain more reliable than superstition.
Why It Matters: AI systems now support a wide range of creative and computational tasks across industries, and small differences in prompt design can alter accuracy and reliability in ways that affect organizational decisions. Misplaced faith in “magic phrases” can distract users from clarifying the task and reviewing outputs. Clear prompting reduces unnecessary iterations and helps users treat AI as a tool that requires careful instruction and evaluation.
- Politeness Does Not Reliably Improve Accuracy: Some experiments found that polite wording increased accuracy on certain benchmarks, while other tests showed no measurable benefit or slight declines. These performances also varied across languages and cultural contexts. While models are updated frequently and results may change over time, current evidence does not support the idea that courtesy consistently improves AI reasoning. Saying “please” may improve a user’s comfort level, yet it does not provide a dependable performance advantage.
- Small Wording Changes Can Produce Large Score Differences On Benchmarks: In a study that evaluated 60 combinations of system-message phrases across multiple open-source models, minor additions such as “You are highly intelligent” or “Take a deep breath and think carefully” sometimes changed math accuracy by wide margins. The results differed depending on model size and whether Chain-of-Thought reasoning was enabled. For one large model without Chain-of-Thought prompting, the highest accuracy came from including no added system message at all. These findings show that prompt sensitivity exists, though it does not follow a single consistent pattern.
- Automated Prompt Optimization Often Outperforms Manual Tweaking: Researchers compared hand-crafted motivational phrases with prompts generated through an automated optimization framework. The automated method produced equal or higher average accuracy in most scenarios, especially for larger models such as 13B and 70B parameter systems. The optimized prompts often appeared unusual and included elaborate narrative framing, including a Star Trek command scenario that improved math benchmark results.
- Role-Playing Affects Creativity and Confidence Differently: Asking a model to assume a persona, such as a professor or industry expert, may increase detail and fluency in open-ended tasks like brainstorming or interview preparation. However, for tasks with one correct answer, assigning expert status can increase the likelihood of confident errors. Researchers caution that giving a model authority within the prompt can amplify hallucinated content, since the model may generate answers with greater certainty even when incorrect.
- Clear Structure and Defined Goals Improve Output Quality: Experts recommend requesting multiple answer options so that users can compare results. Providing writing samples helps the model approximate tone and format more accurately than listing abstract instructions. Inviting the model to ask clarifying questions can improve relevance before it generates a final response, and neutral phrasing can reduce the chance of biased outputs. Detailed constraints, such as audience level and formatting instructions, can also increase the likelihood of receiving usable content.
Go Deeper -> Do you have to be polite to AI? – BBC
The Unreasonable Effectiveness of Eccentric Automatic Prompts – arXiv
Effective Prompts for AI: The Essentials – MIT Sloan Teaching & Learning Technologies
Trusted insights for technology leaders
Our readers are CIOs, CTOs, and senior IT executives who rely on The National CIO Review for smart, curated takes on the trends shaping the enterprise, from GenAI to cybersecurity and beyond.
Subscribe to our 4x a week newsletter to keep up with the insights that matter.



