So I've used this in response to a few threads now, and a locked thread last night wanted me to start a new thread that explains how we can track the virus and determine the start date of this "outbreak" that originated in China.
Thanks to the low cost and availability of sequencing today, hundreds if not thousands of viral genomes from SARS-CoV-2 have now been deposited online to share and distribute to the scientific community for analysis. The group that is leading the way in this is nextstrain.org which offers real-time tracking of pathogen evolution. They have made several different interactive charts to show how the virus is mutating that allows them to track the induction of an infection within a community and the resulting thread.
Here is an example of the power of their approach:
Trevor Bedford, a scientist at Fred Hutchinson in Seattle, shows that ~80% of the infections that occurred in Washington state most likely originated from one founder event, or infected person.
So how do they do this?
Its pretty simple. All viruses have a known mutation rate, meaning at a constant rate, a single nucleotide (A, G, T (here U) or C) becomes altered within the ~30,000 base pair viral genome. All living things experience this, as the machinery that replicates our genetic material is prone to errors. Sometimes those errors are beneficial to the organism, and that is how evolution occurs. Other times they are deleterious, and that mutation is quickly removed from a loss of fitness to the organism. Sometimes they just don't matter at all and persist as they do neither harm or good. They just accumulated over time. For this virus and others, they show a mutation rate of 2 base pairs in a given month.
So if we start with the original viral genome that first caused infection ---
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
After one month we will have at least one population of infections that result in a viral genome that looks like this :
XXXXXXXXAXXXXXXXXXXXXXXBXXXXXXXXXXXXXXXXXX
We can then track this specific pattern to see where around the world this viral genome originated and how it is spreading.
You may have a different one that looks like this --
XXXAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXBXXXXXXX
and we can track this new population as well.
So over time, this version can gain a new mutation
XXXAXXXXXXXXXXXXXXCXXXXXXXXXXXXXXXBXXXXXXX
that we can track as well, but backtrack through the A and B mutations to its original founder, even through we now have a third mutation in C.
So how does this relate to finding out when the original outbreak began?
Well, if we know the mutation rate (two changes per month) and we have enough sequences around the initial reports around the outbreak (thankfully we do from China), we can go back to the earliest days and see what the viral genome looks like back then and build a "molecular clock" to find the beginning of the outbreak.
This YouTube video is a seminar given by Trevor Bedford to Georgia Tech at the beginning of February when the outbreak really hadn't even reached the US yet. He says that based off of 5 viral genomes from China, three of those genomes were identical (from the same virus that had not mutated) while two others only differed by 3 mutations each. This told him that at this point in January, the sequence diversity was low, and we could backtrack the start of the outbreak only 1-2 months earlier. I've started the YouTube video where he discusses this.
Trevor is not the only one that thinks this. Here is another analysis from Andrew Rambaut from the University of Edinburgh that uses modeling and phylodyanmic analysis from 176 genomes to give a mid-November date. TMRCA = The Most Recent Common Ancestor
Data Coalescent model Estimated TMRCA 95% interval
12-Feb, 75 genomesExponential growth 29-Nov-2019 28-Oct-2019 20-Dec-2019
24-Feb, 86 genomesExponential growth 17-Nov-2019 27-Aug-2019 19-Dec-2019
Rambaut Analysis
So what does this mean?
There are a lot of conspiracy theories out there that SARS-CoV-2 has been out and in the population for quite some time, perhaps last fall. I love a good conspiracy theory, but unfortunately this one cannot be true BASED ON ALL THE ACCUMULATED SCIENTIFIC DATA. If you had something in December or January that knocked you and your whole family out for a good week, it was NOT SARS-CoV-2. You are not immune right now to this virus.
Thanks to the low cost and availability of sequencing today, hundreds if not thousands of viral genomes from SARS-CoV-2 have now been deposited online to share and distribute to the scientific community for analysis. The group that is leading the way in this is nextstrain.org which offers real-time tracking of pathogen evolution. They have made several different interactive charts to show how the virus is mutating that allows them to track the induction of an infection within a community and the resulting thread.
Here is an example of the power of their approach:
Trevor Bedford, a scientist at Fred Hutchinson in Seattle, shows that ~80% of the infections that occurred in Washington state most likely originated from one founder event, or infected person.
So how do they do this?
Its pretty simple. All viruses have a known mutation rate, meaning at a constant rate, a single nucleotide (A, G, T (here U) or C) becomes altered within the ~30,000 base pair viral genome. All living things experience this, as the machinery that replicates our genetic material is prone to errors. Sometimes those errors are beneficial to the organism, and that is how evolution occurs. Other times they are deleterious, and that mutation is quickly removed from a loss of fitness to the organism. Sometimes they just don't matter at all and persist as they do neither harm or good. They just accumulated over time. For this virus and others, they show a mutation rate of 2 base pairs in a given month.
So if we start with the original viral genome that first caused infection ---
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
After one month we will have at least one population of infections that result in a viral genome that looks like this :
XXXXXXXXAXXXXXXXXXXXXXXBXXXXXXXXXXXXXXXXXX
We can then track this specific pattern to see where around the world this viral genome originated and how it is spreading.
You may have a different one that looks like this --
XXXAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXBXXXXXXX
and we can track this new population as well.
So over time, this version can gain a new mutation
XXXAXXXXXXXXXXXXXXCXXXXXXXXXXXXXXXBXXXXXXX
that we can track as well, but backtrack through the A and B mutations to its original founder, even through we now have a third mutation in C.
So how does this relate to finding out when the original outbreak began?
Well, if we know the mutation rate (two changes per month) and we have enough sequences around the initial reports around the outbreak (thankfully we do from China), we can go back to the earliest days and see what the viral genome looks like back then and build a "molecular clock" to find the beginning of the outbreak.
This YouTube video is a seminar given by Trevor Bedford to Georgia Tech at the beginning of February when the outbreak really hadn't even reached the US yet. He says that based off of 5 viral genomes from China, three of those genomes were identical (from the same virus that had not mutated) while two others only differed by 3 mutations each. This told him that at this point in January, the sequence diversity was low, and we could backtrack the start of the outbreak only 1-2 months earlier. I've started the YouTube video where he discusses this.
Trevor is not the only one that thinks this. Here is another analysis from Andrew Rambaut from the University of Edinburgh that uses modeling and phylodyanmic analysis from 176 genomes to give a mid-November date. TMRCA = The Most Recent Common Ancestor
Data Coalescent model Estimated TMRCA 95% interval
12-Feb, 75 genomesExponential growth 29-Nov-2019 28-Oct-2019 20-Dec-2019
24-Feb, 86 genomesExponential growth 17-Nov-2019 27-Aug-2019 19-Dec-2019
Rambaut Analysis
So what does this mean?
There are a lot of conspiracy theories out there that SARS-CoV-2 has been out and in the population for quite some time, perhaps last fall. I love a good conspiracy theory, but unfortunately this one cannot be true BASED ON ALL THE ACCUMULATED SCIENTIFIC DATA. If you had something in December or January that knocked you and your whole family out for a good week, it was NOT SARS-CoV-2. You are not immune right now to this virus.