From bench to bot: Does AI really make you a more efficient writer?

A more significant benefit may lie in improving quality, refining tone and reducing cognitive burden. But beware of bias.

Computer-generated illustration of a pencil.
Bedside bots: AI-written responses to patients through medical portals are rated more empathetic than replies from actual doctors.
Illustration by Rebecca Horne / Adobe Firefly
In the “From bench to bot” series, neuroscientist and science writer Tim Requarth explores the promises and pitfalls of artificial-intelligence tools in writing. Read previous essays here.

One of the great promises of generative artificial-intelligence tools such as ChatGPT was that the bots would do away with writing busywork. “What used to take me around a half-hour to write now takes one minute,” one doctor told The New York Times last year, referring to a prior authorization form. ChatGPT “cut the workload from three days to three hours,” said a chemist, writing in Nature about using AI to draft what he calls the “dead documents” that make up much of the drudgery of grant applications. And indeed, when it comes to automating formulaic writing, such as, say, yet another “Facilities & Other Resources” document for a National Institutes of Health grant, AI will almost certainly speed things up a bit. But when it comes to more complex writing tasks, such as a “Specific Aims” page or a delicate email, will using AI actually save you time? Or will all of the requisite prompt-crafting, fact-checking and manual editing only serve to slow you down?

Nearly two years into the generative AI era, we need not rely on anecdotes: There are actual data about how AI affects the writing process, though the picture it paints is more nuanced than either the AI skeptics or AI boosters might have you believe. In a study published in Science last year, researchers asked college-educated workers to complete realistic writing tasks specific to their jobs—and those randomly assigned to use ChatGPT completed the tasks 40 percent faster. A subsequent study of several hundred business consultants found similar benefits. Those using AI were 25 percent faster at writing-heavy tasks.

But findings from the medical field complicate matters. Two studies measured—down to the second—how long clinicians spent reading and replying to tens of thousands of patient portal messages, either starting with an AI-generated draft or composing replies on their own. Contrary to expectations, clinicians using AI-generated drafts spent about the same amount of time on patient messaging as physicians without AI. The takeaway is that though efficiency gains are possible, they certainly aren’t guaranteed.

Looking beyond the topline results in these studies, however, reveals some unforeseen benefits to using AI. In one of the studies on patient portal messaging, even though physicians using AI-generated content didn’t save time, they reported experiencing less cognitive load and less perceived burnout. This outcome suggests that editing may be less burdensome, cognitively, than writing, and that AI assistance may alleviate some of the cognitive burden associated with writing tasks. Reducing cognitive burden may dampen the meaningful and fruitful struggle with ideas that makes writing such a useful thinking and learning tool. But a burnout buffer could be especially important for scientists: A Nature survey from 2021 of 3,200 scientists revealed the lowest levels of job satisfaction in a decade, and signs of burnout were common—particularly among female researchers.

AI assistance can also improve quality. The college-educated workers in the studies above produced writing that was 18 percent higher in quality (as judged by blinded experienced professionals working the same occupations), and consultant performance was 40 percent higher quality (as judged by other consultants and MBA students). In both experiments, the workers and consultants who originally scored lower on writing tasks enjoyed the largest gains in perceived quality. The findings suggest that, because communication disparities can correlate with social disparities, AI could be a tool for improving equity—as some scientists pointed out early on when Science banned AI tools such as ChatGPT shortly after its release. (That ban has since been lifted.)

Of course, “quality” can take a variety of forms. In one of the two medical studies above, physicians had to correct inaccuracies in the AI-generated replies, but they reported being impressed with the level of empathy in those responses. Although not quantified in this particular study, this finding is consistent with other research suggesting that AI replies to patients are rated more empathetic than replies from actual doctors. It turns out that patients really like all those pleasantries busy doctors dispense with when firing off their hundredth message at the end of a long day in the clinic. AI could present a “best-of-both-worlds” scenario: So long as the doctor vets the AI-generated draft for accuracy, patients get a more compassionate and informative response for the same amount of physician effort.

T

his finding reminds me of one of my go-to uses for AI: as a tone editor for my emails. Take this hypothetical scenario: A principal investigator is angry with the director of an imaging core facility. They fire off a snippy email and regret it almost immediately afterward. This is where I find AI’s bottomless well of diplomacy to be a boon. Even if AI may struggle with nuanced scientific or clinical reasoning, it has the potential to help improve quality along other dimensions of writing, such as empathy, that researchers juggling correspondence may ignore. How we feel about automating empathy is, of course, another question altogether.

For the example below, I’ll use Anthropic’s Claude model 3.5 (Sonnet), a worthy alternative to ChatGPT, to explore how you (and your emails) can benefit from a bit of automated diplomacy.

Prompt:

Rewrite this email to the director of an imaging core facility with tact and diplomacy. Use an informal but professional tone:

<<<Dr. Johnson, I am writing to express my frustration with the utter mishandling of my lab’s data by your core facility. The files for our most recent study on sleep-dependent memory consolidation are an absolute mess. Some of the files have wrong names while others appear to be corrupted. To think we pay such high prices for your services and this is what we get is beyond me. We need this preliminary data for a grant application due in 2 weeks, and this disaster has cost us valuable time.

I demand an explanation and expect immediate action. If this has not been resolved in 48 hours, I will have no choice but to convey the situation to the institute director.>>>

Claude output:

Dear Dr. Johnson,

I hope this email finds you well. I’m reaching out about some issues we’ve run into with the data from our recent sleep-dependent memory consolidation study processed by your core facility.

We’ve hit a bit of a snag while reviewing the files. Some seem to have incorrect labels, and we’re having trouble opening others. I know your team usually does great work, so I’m hoping we can get to the bottom of this together.

The timing is a bit tight on our end – we’re aiming to use this data for a grant application due in a couple of weeks. I’m a little concerned about the setback, but I’m optimistic we can sort this out quickly.

Would you have some time in the next day or two to hop on a call? It’d be great to walk through the issues we’re seeing and figure out a game plan. I’m happy to loop in any of your team members who might be able to help troubleshoot.

Looking forward to resolving this and getting our analysis back on track. Thanks in advance for your help!

Best,

[Your Name]

Although to some people it might feel good in the moment to righteously lay into the core facility director, who may be fully to blame, the revised email is more likely to be effective at resolving the immediate issue. The tone-adjusted email maintains a respectful tone, focuses on problem-solving instead of blame, and assumes good faith from the core facility. By providing context for the urgency without making threats, and suggesting specific next steps, it charts a path to resolution rather than escalation and defensiveness. It also preserves the professional relationship, which is probably crucial for both the PI and the core facility director. If this issue with data handling is a part of a pattern of incompetence, then perhaps the PI can bring that up later, in a less heated moment.

As always, AI assistance has its downsides. Users may come to rely too much on AI input or become unduly biased by it. A third study on patient messaging found that AI tools enhanced the informativeness and educational value of responses compared with physician drafts. However, when using AI drafts, the content of physician-edited responses subtly changed to resemble the content put forward by the AI, suggesting that AI introduces anchoring bias. The concern is that physicians aren’t simply using AI to get a jump-start on the mechanics of writing but are adopting the AI’s judgment itself rather than using their own clinical judgment. This very real concern is why I generally recommend using AI for discrete phases of the writing process, such as generating an outline or offering feedback on rough drafts, rather than using it to generate full drafts from a simple prompt.

The following strategies can help you maintain a healthy balance between your expertise and AI assistance:

  1. Generate rough drafts from notes, rather than from a blank page: It’s fine to generate drafts with AI, but do your thinking first, put together some structured notes, and treat AI-generated content as a first draft that requires critical review and substantial editing. This approach can help mitigate the risk of anchoring bias.
  2. Rotate between AI-assisted and non-assisted writing: To develop and maintain your own writing skills, interweave AI tools into your writing workflow, rather than relying on them for chunks of text. This will also help you maintain your own voice.
  3. Customize AI prompts: Learn to craft specific prompts that guide the AI to produce more relevant and useful outputs for your particular needs.
  4. Ethical considerations: Be transparent about AI use, especially in academic writing, and follow any guidelines or policies set by your institution or publication venues.
  5. Fact-check and verify: Always verify facts, citations and specific claims made by AI. These tools have a tendency to generate “hallucinations,” plausible-sounding but inaccurate chunks of information.

As neuroscientists navigate this new frontier, it’s crucial to temper expectations about what AI can and can’t do for you. It may save time on some tasks, but the potential gain in quality and reduction in cognitive load may be more meaningful. The crucial thing is to avoid overreliance, so that AI remains a valuable assistant rather than an unintentional co-author. The hope is that by using AI wisely, neuroscientists can potentially free up more mental energy for the creative and analytical work that drives scientific progress—and buoys scientist well-being.

AI-use statement: Anthropic’s Claude 3.5 (Sonnet) was used to generate the example, which was further edited by the author. Claude was also used during the writing process for editorial feedback.

User beware

When exploring the use of AI, it’s crucial to be aware that to incorporate it into our writing life is to navigate a minefield of possible dangers. AI can confidently produce convincing but inaccurate information (often called “hallucinations”), making it untrustworthy for factual queries, which means it is crucial that you have verification checkpoints in your workflow. Even accurate AI-generated content can be biased. It is well documented, for example, that social biases, such as racism and sexism, are embedded in and exacerbated by AI systems. AI may also recapitulate bias in subtler ways, such as by steering users toward established scientific ideas, which are more likely to be represented in the AI’s training data.

Data-privacy concerns arise when using standard web interfaces, as user inputs can be adopted to train future AI models, though certain technical workarounds offer more protection. And at least one major journal (Science) and the U.S. National Institutes of Health have banned the use of AI for some purposes. Lastly, although generative AI generally does not pose a high risk of detectable plagiarism, that risk may increase for highly specialized content that is poorly represented in the training data (which might not be much of a concern for the typical user but could be a larger concern for the typical scientist). Some AI systems in development may overcome some of these problems, but none will be perfect. We’ll discuss these and other issues at length as they arise.

Get alerts for “AI: From bench to bot” in your inbox.

This column explores the promises and pitfalls of artificial-intelligence tools in writing—when it can make writing better, faster and easier, and how to navigate the minefield of possible dangers.