Peer review post-mortem: how a flawed aging study was published in Nature

How could an article with numerous shortcomings be published in top-tier journal Nature? Hester van Santen reveals how the gate-keepers of science knowingly let flawed research slip through.

On Dutch Demography Day in November, the French demographer Jean-Marie Robine stood behind the lectern in the Utrecht Academy Building. He had flown in from Montpellier for the introductory lecture of the day. It was on an important social topic: the increasing life span of humans.

Over the past hundred years, we have grown ever older, the demographer demonstrated. “In France, there are now about 30 people over the age of 110. In 1960, there was not a single one.” How long will this continue, and what will the consequences be? Robine’s presentation was full of graphs, all from well-known specialists like him.

But after about five minutes, he could not avoid talking about a recent article in Nature by three complete unknowns in the field. It was published on 5 October, under the title Evidence for a limit to human lifespan. Following a demographic analysis, three geneticists from New York concluded that people will never grow older than approximately 115.

“We have colleagues”, began Robine with palpable irony, “who suggest that, contrarily to what we’d expect, we are facing a strong limit [to human lifespan].” Only during the Q&A did a PhD student say it out loud: “Frankly, this is the weakest paper I have read in a top journal.” Robine did not comment. Perhaps it was because his own name was at the bottom of the publication.

That was the first thing demographer Joop de Beer of the Netherlands Interdisciplinary Demographic Institute says on the phone when asked about the Nature article. “We are working on a response.” The PhD student who spoke in Utrecht was not alone in his bold assertion – far from it.

“It is frustrating that this is published in a top journal”, says John Wilmoth, head of demographics at the United Nations on the phone. “The data analysis is inappropriate in several ways”, Professor Shiro Horiuchi cautiously e-mails from New York. And Jim Vaupel, director of Germany’s Max Planck Institut für Demografische Forschung, responds in a way that is familiar to his peers – he doesn’t mince words: “They just shovelled the data into their computer like you’d shovel food into a cow.” At least four research groups have sent critical comments to Nature.

How could this article appear in one of the world’s leading scientific journals? It’s a question which is regularly asked in the academic world, but never answered. Not, for example, in response to the widely ridiculed article about the ‘arsenic bacteria‘ in Science (2011), which purportedly uses arsenic as a building block. Not after the big scandal about STAP stem cells in Japan (2014), which began with an article in Nature that was later withdrawn. And certainly not with regard to the many uncontroversial publications in top journals.

The answer to that question can be found with the gatekeeping mechanisms of the publishing process: the editors of the magazine and peer reviewing. Those peers are fellow specialists who are invited to evaluate the manuscript. At the top journals, the gatekeepers are extremely selective and reject the vast majority of the manuscripts submitted. They have a great responsibility. Manuscripts they grant access to, such as Evidence for a limit to human lifespan, will count heavily in the current social and scientific debate – in this case about the human lifespan. It is a weighty role for a publication whose conclusions, according to many specialists, simply do not stack up.

Science journalists like myself write about studies that are printed in the pages of top journals after passing those strict border controls. That gateway is a quality mark for research: it has been peer reviewed. Scientists do the same thing.

There were however loud objections to Evidence for a limit to human lifespan immediately after it was published. That became the basis for a critical article in this newspaper. It made me curious. What had the process of editing and peer review been for this Nature article?

The peer review process takes place in strict confidence, so Nature did not want to go into the details. However, I was able to reconstruct the process based on conversations with two of the three peer reviewers and the authors. It turned out to be virgin territory: people who are familiar with peer reviewing say they do not know of any other examples of this kind of peer review post-mortem.

According to Nature, the peer-review process is rigorous and independent, with the claims of each paper being weighed against the many others. But my conversations yielded a very different, disconcerting picture. The authors said their reviewers had a better grasp of the material than they did themselves. The reviewers knew that. They had criticisms, yet they did not scrutinise a large part of the analysis at all. And Nature let it slide.

“It’s not really my specialism, all of this”, says genetics professor Jan Vijg about his publication in Nature. “I’m not a demographer.” Of the three authors, he is the ‘corresponding author’, the author who is ultimately responsible for the article. Vijg is a geneticist, specialising in life span and ageing. He was born in Rotterdam and left for the US early in his career. Since 2008, he has been a professor at the Albert Einstein College in New York.

Also working in his lab were the two ‘first authors’ of the study, postdoc Xiao Dong and PhD student Brandon Milholland – they performed the analyses. They specialise in trawling through large volumes of DNA data. But, says Jan Vijg, “at the lab meetings we talk about all kinds of things”. At one such meeting, the conversation turned to the question of whether the oldest people are actually still getting older. Vijg: “I said to my people: look into it. They know how to work with databases.”

Milholland and Dong set to work. They performed some calculations based on life expectancy in the United States. And they did something that captured the imagination: they made a graph of extremely elderly people, those aged 110 or more, aka supercentenarians. The most famous, and the oldest of all, was the Frenchwoman Jeanne Calment, who died in 1997 at the age of 122. Xiao Dong and Brandon Milholland relied on data from the Gerontology Research Group, an international group of enthusiasts who thoroughly examine claims about record-breaking elderly people – the Guinness Book of Records uses the same lists.

Dong, Milholland and Vijg concluded that the very oldest people have not grown any older over the past 20 years. This result was surprising . Vijg: “We were putting more and more energy into it. So I thought: let’s just take a punt. Let’s see whether Nature is interested.”

The manuscript – thinner than the ultimate version – landed on the desk of Nature editor Marie-Therese Heemels. She has been working at Nature for 21 years and ageing is part of her portfolio. She and her colleagues made a crucial decision: the article qualified for peer-review.

In April, Heemels sent the article to three reviewers. I failed to establish the identity of one of them, who was clearly very familiar with the material. Even a call from an influential demographer on a mailing list did nothing. The second was the demographer who had spoken in Utrecht: Jean-Marie Robine, professor at the French government Institute INSERM.

The third was also an expert in demographics: Jay Olshansky, professor in epidemiology at the University of Illinois in Chicago. In him, Nature had selected a reviewer who has already been declaring for 26 years what the authors had now also concluded: that there is a limit to human lifespan.

“I knew the data better than the authors. They’re biologists! They are not demographers, you know”, Jay Olshansky said on the phone. “I have been using these same data for the last 26 years”, he says. “I knew the conclusion.”

With his article, geneticist Jan Vijg had gate-crashed a long-running debate within demographics: how old can we become? Will our children or grandchildren live to be 130 or 140, or ‘just’ 90 or 100 like us? It is a significant question and one of imminent concern. But there are few population data on which to base such forecasts. Any one country – France, say – only has a few dozen people aged over 110.

This has resulted in a fierce battle between different schools of thought. In one corner: demographer Jim Vaupel of Germany’s Max Planck Institute for Demographics. In the other: Jay Olshansky. “Propaganda”, was Vaupel’s verdict on the day that Evidence for a limit to human lifespan was published. “It all tells a very compelling story”, Jay Olshansky told the New York Times.

Vaupel foresees an undiminished increase in lifespan. A number of demographers, including Jean-Marie Robine, regard Vaupel’s predictions as too simple. They assert that the increase in life expectancy has recently slowed – a little, and gently.

Jay Olshansky goes further: he predicts the end of increased life expectancy, at least in the United States. He argues that the battle against chronic disease will eventually fail to outweigh the ongoing deterioration of our bodies. Vijg’s article linked in with his ideas. “There is some sort of limit, which shouldn’t be surprising to anyone. It’s not as if anybody could live for ever.”

Olshansky says of his review report that it was “quite extensive”, and that his most important criticism concerned a biological principle. “The language I insisted had to be in the paper was that there is no genetic program for ageing or death. There is no clock in our bodies.” He had no other major comments, he says. According to Olshansky, “I may have mentioned some technical points that I won’t get into, as a reviewer.”

Anyway he didn’t focus in detail on statistics or graphs, he explains. “The focus should be less on the statistics, and more on the general observation that people don’t live that long. That’s the point!”

Until then, I had thought focusing on statistics was a major task for a reviewer. But at Nature, that is apparently not a problem at all. In the eleven primary questions drawn up by the journal for an “ideal review”, statistics and methodology are not explicitly addressed. They feature in the second list of questions, for “if time is available”. Of the eleven questions, five are about the novelty and importance of the manuscript. For example: “Is the paper likely to be one of the five most significant papers published in the discipline this year?”

Jay Olshansky does not remember what advice he gave when reviewing the first version of the manuscript Evidence for a limit to human lifespan. He says he either advised rejecting the manuscript or having it revised.

According to Jan Vijg, all the reviewers sent back an “enormous quantity of criticism”. The unknown reviewer was similarly critical of all kinds of aspects of the methodology. Vijg goes so far as to call him or her “very unpleasant”. Typically Jim Vaupel, thought Vijg. But Vaupel vigorously denies it was him.
Contrary to Olshansky, the French demographer Jean-Marie Robine does remember what his judgement was. “I was very negative. I advised Nature to reject it.”

On 14 April, Jean-Marie Robine was sent the first version of Vijg’s manuscript in Montpellier. He saw methodological shortcomings. An analysis about life expectancy based only on the United States? The historic mortality figures of that country are not regarded by demographers as being very reliable. He advised the use of the global demographic Human Mortality Database – with which the biologist Jan Vijg was not familiar.

Part two of the manuscript dealt with Robine’s specialism: the analysis of supercentenarians, or those aged over 110. He had devoted his career to them. It was he who had confirmed that his compatriot Jeanne Calment, the oldest person in the world, really was 122 years old when she died. And in 2002, together with Jim Vaupel among others, he had begun the work of establishing a reliable database of all the supercentenarians per country: the International Database on Longevity (IDL).

Whereas Xiao Dong and Brandon Milholland had got their details about the over 110s from the GRG’s ‘Guinness Book of Records’ database. Jean-Marie Robine advised them to use the IDL data instead. “The older data from the GRG are complicated.” He also explained to the American trio how they could best set up their analysis of the International Database on Longevity. “We made good use of his advice”, says Jan Vijg.

Because of Robine’s useful suggestions, Nature even offered to thank him by name at the bottom of the publication. He agreed. “For us as initiators of the IDL, it is good to see the result of our efforts,” he says. “Outsiders can use these data to write a Nature publication, with the help of good reviewers.”

But initially, it seemed that there would not be any Nature publication. The editors accepted the criticism of Robine and the unknown reviewer: the manuscript by the three biologists was rejected unconditionally. “I got it back within three weeks”, says Jan Vijg.

But then, sometime in May or June, something unexpected happened which decided the fate of the publication: the editors of Nature changed their minds. Geneticist Brandon Milholland, who was awarded his doctorate in September, recalls. “First the editor said: we’re not interested. But we said: why don’t you take another look?”

Milholland’s professor Jan Vijg had been struck by one thing in the critical reports. “The reviewers did not deny that we had a point. Our conclusions still stood.” No one had said out loud that there was no limit to human lifespan. Not Jay Olshansky, obviously. But not Robine either – even though he did not consider the conclusions justified. “I don’t think that the kind of approach the authors took gives evidence for a limit to human lifespan.”

But he felt that saying so out loud as a reviewer was going too far. “You cannot recommend to reject a manuscript because you only disagree with the interpretation of the results.” It is a self-effacing stance, but it doesn’t seem to correspond entirely with Nature’s review guidelines. One of the 11 primary questions for the ideal review is: “Are the claims convincing?”

Either way: Vijg, Milholland and Dong were given a second chance. And in doing so, they made eager use of the demographics lesson they had received from their reviewers, says Vijg. “They literally told us how we had got the demographic analyses wrong, and how we should be doing it.”

According to Nature, reviewers make an “independent assessment”. Whether that was still true in this case is very much open to question. The assessments by Robine and Olshansky cannot be seen in isolation from their scientific beliefs and careers. Moreover, as a result of their suggestions, they became ever more closely involved with the manuscript.

Jay Olshansky says: “The [Nature] editors came back to me several times. And the authors were very persistent. They kept trying to get it right.” Professor Jan Vijg: “You could say we were pretty good students.” As the corresponding author, the geneticist has final responsibility for the content, but he virtually attributes the role of co-authors to his reviewers. “It doesn’t just belong to us. It belongs to a whole group of people.”

Within three months after they had received the first manuscript from Nature, the three reviewers were sent the second version. Jean-Marie Robine received the document on July 11. It was different to the first, in particular being far more extensive.

Postdoc Xiao Dong had performed the analysis that Robine had proposed, using the IDL database of over 110’s. PhD student Brandon Milholland had, largely on his own, gone to work on the mortality rates from the Human Mortality Database. This was a whole project in itself: an analysis of population figures from 41 countries.

The first manuscript had contained fewer than five graphs. Milholland had by now produced 205, of which 200 were printed in the ‘Extended Data’, the data supplement. You’d expect the reviewers to have pored over those graphs. But they didn’t.

“I don’t remember whether I looked at it”, says Jean-Marie Robine. “There are better experts in this area than me.” And he adds that he only studies the data supplement when reviewing in exceptional cases. On paper, each of Milholland’s graphs is no bigger than a €2 coin. He finds all those extra charts “boring, noisy and confusing”.

Robine was not the only one. According to Milholland, none of the reviewers discussed the details of his analysis. Disappointing perhaps? “When a reviewer says it’s a good paper, nobody is disappointed that they didn’t go into it in more detail. When it gets published, you know, that’s good enough.” This is Milholland’s 9th scientific publication. He generally finds review processes “a little frustrating”. “Sometimes the reviewers are so superficial that you are really like, wow, did you read the paper?”

The second review by Jean-Marie Robine was “very brief”. He wrote that “…it is hard to be strongly opposed to the manuscript”. “The authors followed up on 100 percent of my earlier suggestions.” Only the third reviewer remained critical to the end, recounts Jan Vijg. “That person continued to insist it wasn’t true.”

By now, the French professor has studied Milholland’s minuscule graphs. “Amazing.” The figures shoot up and down from year to year, such as in New Zealand between 1960 and 1980. Even Milholland sounds surprised on the phone. “Let’s have a look… hmm… What could be causing that?” Reviewer Jay Olshansky reacts similarly: “This is not possible.” He explains the fluctuations as a result of the limited data. But, continues the demographer, that is not a problem. “The fact that the sample sizes are so small, is the story! People just don’t live that long.”

Six days after Jean-Marie Robine had sent in his positive judgement, he received word from Nature: Jan Vijg’s article had been accepted. The publication date was 5 October. It was even at the top of Nature’s press release list, and was accompanied by a full-page, positive editorial commentary. The author: Jay Olshansky.

He wasn’t required to point out that he had also evaluated the article before its publication. Everything between the submission of a manuscript and publication (or rejection) takes place in strict confidence at Nature. It’s the same with the other top journals Science, Cell, The Lancet, NEJM. Many scientists spend time on the review process without being paid. On its site, the Nature Publishing Group thanks “the 32,319 individuals” who reviewed articles for the publisher in 2015.

By far the majority of articles, 92 percent, are rejected by Nature’s editors or reviewers. Were Nature’s editors, the demographers Jay Olshansky and Jean-Marie Robine, and the third peer reviewer, strict, independent gatekeepers who weighed the claims of each paper against the many others? Were they right to give Evidence for a limit to human lifespan a place at the top of the scientific tree, at the cost of 11 other manuscripts?

Chris Graf thinks not. “I don’t believe that Nature is happy that they published this manuscript.” Graf is vice chair of the Committee on Publication Ethics. COPE is an association of about 20,000 editors of scientific magazines that aims to improve the publication process. “This story shows the gaps in the processes at Nature. I’m also sure that they’re acting to address any such issues from the case you reported.”

So was it an unusual slip-up?

The assessment of two others who are familiar with peer reviewing was as cool as it was discomforting. “I see all kinds of failures in this particular process,” states Richard Smith, “but I don’t think that they are unusual.” Until 2004, Smith was the editor-in-chief of the British Medical Journal and is now a well-known critic of the classical peer review system. Jelte Wicherts, research methodology and peer review specialist at the University of Tilburg, agrees with Smith’s conclusions. He says that the process is “a bit messy, but probably not atypical.”

They read this whole case study and sent back a detailed assessment. Yes, they both say: the independence of the peer reviewers was compromised. And yes, the reviewers were insufficiently critical of the statistics. And that is all fairly commonplace.

There is only one thing that surprises both of them: that Nature changed its opinion of the manuscript so suddenly. “It suggests to me that the editors were attracted by the ‘sexiness’ of the paper,” says Smith. And Wicherts adds, “It seems as if something made them realise there that this paper could be newsworthy and influential. And that is in fact their business model.”

“We received an unbelievable number of e-mails”, said Jan Vijg, a month after publication. Many of the responses were critical, he says. But there were also messages of support. “The editor of Nature said that the article was very well received.” One generous individual even wrote a cheque for $60,000. “Because he thought it was such good work.”

Jean-Marie Robine concludes his lecture in the Utrecht Academy Building. He briefly returns to the subject of Evidence for a limit to human lifespan. He explains that it is difficult to make forecasts about the limits of human life. But looking only at record-breaking old people in the way that Jan Vijg did? No, he doesn’t think that is the approach. “That is only about the details.” What is the right way, he says, is a question that the professional field must answer. “I can only encourage you to say something intelligent about the increase in human lifespan.” The room chuckles.

Sniggers and chuckles. ‘Top scientific articles’ with shaky methods and soft conclusions almost always receive this response. The same thing happened to the arsenic bacterium, and it will probably happen now. Retract the article? If, as in this case, there is no question of scientific malpractice, that would be highly unusual. However, one or two critical responses may be added to Jan Vijg’s article. One such response is currently under peer review.

“I hope Nature accepts one of them,” says Vijg on the phone. Because then he’ll get to write a response. “And we’ll have another article in Nature.”

Preceding publication of this article, the authors and peer reviewers of ‘Evidence for a limit to human lifespan’ who were consulted for the article checked facts and quotations. Nature was sent an extensive list of questions.