The answers so far have focused on the data itself, which makes sense with the site this is on, and the flaws about it.
But I’m a computational/mathematical epidemiologist by inclination, so I’m also going to talk about the model itself for a little bit, because it’s also relevant to the discussion.
In my mind, the biggest problem with the paper is not the Google data. Mathematical models in epidemiology handle messy data all the time, and to my mind the problems with it could be addressed with a fairly straightforward sensitivity analysis.
The biggest problem, to me, is that the researchers have “doomed themselves to success” — something that should always be avoided in research. They do this in the model they decided to fit to the data: a standard SIR model.
Briefly, a SIR model
(which stands for susceptible (S) infectious (I) recovered (R)) is a series of differential equations that track the health states of a population as it experiences an infectious disease. Infected individuals interact with susceptible individuals and infect them, and then in time move on to the recovered category.
This produces a curve that looks like this:
Beautiful, is it not? And yes, this one is for a zombie epidemic. Long story.
In this case, the red line is what’s being modeled as “Facebook users”. The problem is this:
In the basic SIR model, the I class will eventually, and inevitably, asymptotically approach zero.
It must happen. It doesn’t matter if you’re modeling zombies, measles, Facebook, or Stack Exchange, etc. If you model it with a SIR model, the inevitable conclusion is that the population in the infectious (I) class drops to approximately zero.
There are extremely straightforward extensions to the SIR model that make this not true — either you can have people in the recovered (R) class come back to susceptible (S) (essentially, this would be people who left Facebook changing from “I’m never going back” to “I might go back someday”), or you can have new people come into the population (this would be little Timmy and Claire getting their first computers).
Unfortunately, the authors didn’t fit those models. This is, incidentally, a widespread problem in mathematical modeling. A statistical model is an attempt to describe the patterns of variables and their interactions within the data. A mathematical model is an assertion about reality. You can get a SIR model to fit lots of things, but your choice of a SIR model is also an assertion about the system. Namely, that once it peaks, it’s heading to zero.
Incidentally, Internet companies do use user-retention models that look a heck of a lot like epidemic models, but they’re also considerably more complex.