How Wikipedia reading habits can successfully predict the spread of disease

The ability to forecast the spread of an infectious diseases weeks in advance can make a world of difference when it comes to public health responses. For decades, scientists have been trying to create models to predict how something like the flu will spread.

People's Internet usage has opened a new door for predictive data. There are already some tools out there, such as Google Trends, which triesto"nowcast," or showwhat's happening right now with the spread of certain diseases in the world. There have been studies, too, on whether Twitter can accurately predict how a disease is spreading.

But getting access to Google Trends or Twitter data is not always easy -- or cheap. So a team of mathematicians, biologists and computer scientists got together to see if they could use something that's completely open and free: Wikipedia.

As it turns out, they could accuratelyforecasthow influenza and dengue spread based purelyon people's reading habits of Wikipedia articles. Last week, they showed how their algorithm could predict flu season in the United States. The full results of their research are published in this week's PLOS Computational Biology.

"Nowcasting is cool, but ideally you want to provide informationto public health departments and policymakers so they can plan ahead of time," said Sara Del Valle, a project leader at Los Alamos National Laboratory whose team worked on the study. "Becauseif you reallywant to make a difference in how peopleare treated when they come to clinics and hospitals, it's better for them to be prepared. If they know in advance, we willsee peoplein a couple of weeks, four weeks, they can better prepare."

Researcherslooked at seven diseases and 11 countriesover a period of three years, starting in 2010, and comparedpage views on Wikipedia articles about those diseases to official data from health ministries.By looking at readers' habits, theysuccessfully predicted the spreads of influenza in the United States, Poland, Thailand and Japan and dengue in Brazil and Thailand at least28 days in advance.

Official government data -- usually released with a one- or two-week lag time -- lagged four weeks behind Wikipedia reading habits, according to Del Valle; people, she said, are probablyreading about the illnesses they have before heading to the doctor.

But not all the diseases or countries yielded such results; they couldn'tpredict slow-progressing diseases like HIV/AIDS, or diseases with very small numbers of victims, such asEbola (before the current outbreak) in Uganda or the plague in the United States. Seasonal diseases were much easier to forecast using the Wikipedia model.

And the study had other limitations; for instance, researchers used language as a proxy for country (Japanese articles about influenza were used to predict the spread of the disease in Japan). That may work for some languages, but for some more widely spoken ones, like English, it can be trickier.

Even still, researchers were able to accurately predict the spread of influenza in the United States by examining the page views for EnglishWikipedia articles. They hope they can next get country-specific datafrom Wikipedia.

See more here:
How Wikipedia reading habits can successfully predict the spread of disease

Related Posts

Comments are closed.