Reimagining autism screening: A conversation with Roald Øien

For the past five years, Roald Øien has been stress testing the Modified Checklist for Autism in Toddlers (M-CHAT), a go-to screen for autism introduced in 2001. His results are far from encouraging: The screen misses more than 70 percent of children who are later diagnosed with autism, while mostly flagging those without the condition. It may also confuse autism with other developmental conditions, such as intellectual disability.

The M-CHAT-R, a 2014 revision that reduced the number of questions from 23 to 20 and adjusted the cutoff score, doesn’t do much better. To assess its effectiveness, Øien and his colleagues retroactively applied the new cutoff algorithm to M-CHAT data for 54,463 18-month-olds. His team described their results in November in Autism Research.

The switch decreased false positives — non-autistic children incorrectly identified as autistic — by 2.4 percent, but it also increased false negatives — autistic children missed by the screen — by 3.6 percent, a tradeoff Øien says he is not prepared to accept.

Spectrum talked to Øien, professor of special education and developmental psychology at the Arctic University of Norway and adjunct assistant professor at the Yale Child Study Center, about the work and the conversations he hopes it prompts around screening, developmental monitoring and care for autistic children.

Spectrum: What drives your work on the M-CHAT?

Roald Øien: I am motivated to lower the age of diagnosis, which seems to be fairly stable. Even if we do massive screening like is recommended in the United States, it doesn’t seem to affect the age of diagnosis on a country basis. And there have been studies showing that universal screening is really expensive.

We know that parents start worrying at around 15 months of age for a lot of kids with autism. But the average age of diagnosis, as shown in many studies, is 3 to 4 years of age. That’s a long time for a parent to be concerned. If we manage to reduce the age of diagnosis, it could be beneficial for parents, both for coping and also for the child’s outcomes, because we know early identification is associated with better outcomes. I have a daughter with autism. She’s 15. So I understand that early identification and reducing the time of concern is of great importance to parents.

S: What prompted your new analysis?

RØ: I want to raise discussion on screening instruments and how we try to make them better by moving different thresholds or cutoffs, while we’re not really changing the measurement itself. I don’t really know if that is the way to go. Nobody had looked at this particular algorithmic change, and we wanted to see, does this really help identify kids with autism?

S: What was the algorithm change, exactly?

RØ: In the original M-CHAT, a child screened positive if their parent answered ‘no’ to 2 or more of 6 “critical items,” which are the most predictive of autism, or to 3 or more of the 23 questions. Our studies show that most kids with autism fall below both of those cutoffs. In the M-CHAT-R, they changed the algorithm so that anyone scoring over 2 should receive some follow-up. They also removed three items that seem to be bad predictors of autism.

We applied the new algorithm to the old M-CHAT. We cannot say for sure that the algorithm is the only thing that’s changing the results from the M-CHAT to the M-CHAT-R. But it might be worth hypothesizing that a lot of the changes in the identification rates are mainly based on the change in algorithm and not on the questions themselves.

S: How should researchers balance false positives and false negatives? Is one ‘worse’ than the other?

RØ: Everything in life has a tradeoff, right? So it’s just a matter of deciding if that tradeoff is acceptable. If you’re in a world where you want the M-CHAT to be an autism-specific instrument and you don’t care about any other developmental disabilities, it might be worth missing a lot of kids with potential other diagnoses. On the other hand, we see that when we’re losing the false positives, we’re also losing a portion of kids who have autism. And I don’t know if that’s a tradeoff that I want to personally say that I’m comfortable with.

This paper is not revolutionizing screening. It’s a part of the discussion on how to come up with solutions that are more affordable and more precise, and that capture more of the broader phenotype of autism than the M-CHAT alone can do.

S: What suggestions do you have to improve screening?

RØ: We need to revisit universal screening. In Europe, we do developmental surveillance instead. Screening shouldn’t be the only thing at 18 and 24 months. We should be practicing developmental surveillance at both 18 and 24 months, and maybe at 5, 6 and 7 years of age, because we know that there’s a large portion of the group that is identified later in school age.

I’m not sure we can improve the M-CHAT how it is now. I don’t think that’s the goal of this, either. It’s still useful in its way to look for the prototypical signs of autism in a certain subgroup of children. But I think we should practice it with caution, and we need to be aware of all the limitations of it. And we need clinicians to know that it’s missing a lot of children. It’s not a yes or no to autism.

And there might be many other ways to go about developmental screening. There are other screening instruments, such as the Ages and Stages Questionnaire and the Parents’ Evaluation of Developmental Status. There are a lot of instruments out there, and I don’t think that one of them will ever rule them all.

S: What are the next steps for you?

RØ: Well, I think the M-CHAT is a closed chapter for me now. We have highlighted all the limitations with it, and not just with the M-CHAT. It’s with all screening instruments. It doesn’t necessarily matter if you change the scores and the algorithms and the cutoffs and the wording and the exemplifications. It’s more about having discussions to drive this further. That’s what I really want.