Wikipedia could help predict disease outbreaks

AP Photo/Eduardo Verdugo A woman receives a flu vaccination.

Say you're feeling sick, and you visit Wikipedia to see if your symptoms match those of the flu. Wikipedia logs that visit and makes public the number of people who visit every single one of its pages.

Now, a group of data scientists fromLos Alamos National Laboratory think this data could be useful for tracking diseases. In a paper published Friday in the journal PLOS Computational Biology, they present an algorithm that uses Wikipedia traffic data to estimate the rates of diseases in the real world and project imminent outbreaks.

Would this actually work? The idea of following what happens on the internet to model and predict disease rates isn't entirely new. For several years,Google Flu Trends has attempted to use Google queries as a proxy for flu rates, and other researchers have tried touse tweets for the same purpose. Yet those methods have had some real problems with predicting outbreaks.

But the researchers behind this new paper say that Wikipedia data might be the best bet and could allow us to track a number of diseases in different countries.

The researchers began by picking 14 different disease-country pairs to look at, such as the flu in the United States, tuberculosis in Thailand, and dengue in Brazil.

Next, they collectedpublicly-available page view data for every single page on Wikipedia in the relevant language. The data typically came on a weekly or monthly basis, and spanned a few years.

For each of these 14 disease-country pairs, the researchers also had conventionally-collected public health data on rates of the disease over time. They parsed each language's Wikipedia traffic data to find the ten particular pages that best matched with the known disease data.

"The general disease page was generally the one that correlated most strongly," saysNicholas Generous, the study's lead author. "Drugs and treatments were also usually in the top ten, and then for the flu, some of the various strains would also be in there."

In 8 of the 14 cases, the combined group of Wikipedia articles matched the actual disease rate extremely closely. For dengue rates in Brazil, for instance, traffic to the set of 10 articles correlated nicely throughout a three-year period:

Read more:
Wikipedia could help predict disease outbreaks

Related Posts

Comments are closed.