Image source: Gao Weilun Design
On Christmas Eve last year, the first strain of genetically sequenced COVID-19 was released to the world. It came from a 65-year-old male in Wuhan. So far, research institutions in various countries have uploaded 12,294 strains of virus to GISAID, the world's largest viral gene database. Scholars all over the world are seeking the answer to the following question: Where does COVID-19 originate? And where is it going next?
Since February of this year, Chinese and British scholars have published research results one after another, but due to insufficient virus samples, they have not yet provided a convincing answer to the "mystery" of the virus.
The research on the origin of the virus has attracted interest from Ching-Yung Lin, an adjunct professor at Columbia University and CEO of Graphen, an AI startup. Everyone knows Ching-Yung Lin in the artificial intelligence world. He had worked at IBM for nearly 17 years and was the Chief Scientist and Founder of IBM Watson's Network and Artificial Intelligence and Super Computing department.
Ching-Yung, Lin, Adjunct Professor at Columbia University and CEO of Graphen, an AI startup
Can't think of a reason that COVID-19 wasn't the first spread from China.
In March, artificial intelligence was used to draw the evolutionary path of 500 COVID-19 viruses in the world; in April, after further comparing over 12,000 genetically sequenced viruses, he made two major discoveries:
First, these 12,000 strains of viruses have evolved at a rate of "an extra mutation that can be spread every week." Based on this mutation rate, the first new case of coronary pneumonia may have appeared in mid-to-late November, 2019.
Eight virus families throughout time and variability characteristics in different countries.
B: These mutations affect each protein, but they have no longer spread in the existing sequence.
C: S protein is the key to virus infection in humans, but the mutation site is not on the binding trend (RBD) of human cell membrane receptors (ACE2), and may not affect the intensity of infection.
D: NSP3 is responsible for cutting off each protein of the virus and affecting the protein of infected cells.
E: The function of ORF8 protein is yet to be studied. NSP13 may be used by viruses to unfold RNA.
F: These mutations affect each protein, but no large-scale transmission has occurred in the existing sequence.
G: N protein is a shell used to protect viral gene RNA.
H: The function of ORF3 is to pierce the host cell membrane and let the virus replicated in it pass through.
Second, viruses can be divided into eight categories according to their distribution: the ancestor is the starting point of all virus mutations A, and then evolved B, C, D, E, F, G, H. A and B appeared as early as the end of December, 2019 and H was not first discovered until February 19, 2020. (Check which virus variants are in your country? https://www.graphen.ai/covid/types.html)
During these 60 days, these eight virus families spread around the world: the B family swept across China, the C family spread to Europe, the D family occupies the U.K. and the Netherlands, E occupies the West Coast of the U.S., F spreads to Spain, South Korea, Austria and China, the G family is mainly found in Europe, and H crosses the ocean to the East Coast of the U.S. Except for E, Taiwan has all other types of Coronavirus; the highest percentages are Type B from China, and Type H, which has the fastest propagation speed.
Ching-Yung Lin believes that the major virus strains in some regions were only mutated locally before a pandemic occurred, and it was impossible to prove that "the source of infection came directly from China." However, the genetic evolution tree shows that each of these over 12,000 new Coronaviruses can be "backtracked" to the end of December, 2019, and the earliest two viruses that were sequenced in China in early January, 2020. Therefore, we cannot find any reason to believe that the virus was not spread from China.
Genome sequencing analysis answers "Where does the virus come from?"
Because viruses mutate regularly, as long as scientists collect enough viruses and add detailed program calculations, they can compare the number and direction of mutations, draw a virus family tree, infer the time of outbreak, and then answer "where does the virus come from?"
Knowing where the virus came helps to explore whether there is a future storm in the "source" gene pool (such as bats); understanding where the virus goes is to observe the direction of virus evolution and monitor new possibility in virus development (such as loss of smell or taste ), which can be used as a reference for the development of vaccines.
These two big questions also trigger more questions: For example, why is the pandemic situation on the East coast of the United States more serious than that on the West coast? Where did the virus that caused the failure to smell and taste come from? For Taiwan to keep the pandemic from going bad, in addition to the prevention policy, is it also related to the type of virus? After classifying the viruses, the genetics and virologists will further analyze and answer.
Family A: Virus closest to batA has only two genomes, the functions are not very different, and there are three things in common:
First, they all originate from Wuhan.
One strain (A1) spread locally in Wuhan, and the other strain (A2) may spread to Hubei, Guangdong, Jiangxi, Shandong, Zhejiang and other provinces before the closure of Wuhan. There are also sporadic cases in Taiwan and Australia.
Second, other viruses spread around the world centering on them, and are the starting point of almost all virus strains in the evolution tree.
A1 is the first virus strain to be released in sequence, and it is also the current global benchmark for Coronavirus research; A2 is closer to the Coronavirus genes in bats and pangolins.
Third, their genes are closest to the potential original host of the new coronavirus-Chinese chrysanthemum bat (Note)
Family B: Originated and outbroke in China
Since February, 2020 we've had very few virus strains uploaded from China. Of the over 11,000 virus strains in the database, only around 400 are from China, with B accounting for the vast majority.
More than half of the cases in Wuhan came from the B family, and the rest are distributed in Hubei, Jiangxi, etc., as well as Guangzhou, Foshan, Jingzhou, Qingdao and other cities, and later spread to all parts of Asia
The proportion of virus distribution in some European countries is higher than that of type C. Graphic design: Chen Fangyu
Family C: Spread through Europe
On January 28, 2020 Munich, Germany collected the first European case, which came from the most spread virus A in Wuhan. It has many mutated "relatives", and has been spotted in European countries a few weeks later: one of them caused a large infection in northern Italy after February 20, and all the viruses of Milan came from this branch, finally spreading to the whole of Europe, mutating into C.
Family G: From Europe to South America
The virus covers almost the entire European continent, including Portugal, Switzerland, Czech Republic, Russia, Ireland, Italy, Belgium, United Kingdom, Netherlands, and also to Brazil, where it became the origin of pandemics in South American countries.
The trouble is that the G type is not only the S protein variant, but also the N protein variant. Since the earliest cases of olfactory taste loss were reported in the United Kingdom, there are also many cases of British olfactory taste loss in Britain. Ching-Yung Lin suspected that the N protein mutation may be related to the loss of smell and taste.
He explained that this N protein mutation will affect a gene in the human body; if this gene is mutated, it will cause muscle atrophy, which is commonly known as ALS.
With the spread of the virus, there have been cases of loss of smell in Germany and South Korea. Loss of smell may be caused by viruses attacking the olfactory cells of the nose, or may deeply attack the olfactory bulb of the brain (that is, the relay station of the olfactory neurons of the nasal cavity and the brain's olfactory center.) In fact, a paper in 2008 pointed out that "SARS will attack experimental rats sense of smell"
"The loss of smell and taste is because the nerve is attacked, and ALS is also a nerve problem, so it may be related," Ching-Yung Lin boldly assumed. But there is no evidence yet, and clinical trials would be needed to confirm.
Family H viruses spread in the eastern United States. Graphic design: Chen Fangyu
Family H: Spread from France to the U.S. East Coast
Although the United States banned Chinese entry at the end of January, 2020 and blocked viruses from China; it was not known that the Coronavirus had invaded Europe until March 11 when the United States completely banned entry. The viruses quietly bypassed the Atlantic and landed from the East coast of the United States.
Ching-Yung Lin observed that the largest strain of the U.S East Coast first appeared in northern France on February 21, 2020 and landed in New York 10 days later; it spread and reproduced at a high speed, and as many as 86% of viruses in New York State were its descendants.
"I was shocked, the virus' lineage is so pure!" Ching- Yun Lin explained that when an overwhelming virus strain appears in an area, it often means that the virus has strong activity, and it quickly swept across the East coast of the US upon landing. This is possibly why there are many more confirmed cases on the East coast comparing to the West coast.
Therefore, many US media outlets believe that the Coronavirus came from Europe, not the "Chinese Virus" in the mouth of US President Trump. After comparing the gene sequences, Ching-Yung Lin believed that the fair statement was "the United States was attacked by viruses from both Europe and Asia". The former accounted for 53% of the total number of viruses in the United States (Family H), and the latter accounted for 28% (Family E). .
Family E: Wide spread in Canada and the U.S West coast
On January 19, 2020, the first Coronavirus in the US was collected from Seattle, Washington, from a 35-year-old male. He had just returned from a visit to his family in Wuhan, and after 4 days of fever and cough, he wore a mask and entered a small local clinic.
California, another large state on the West Coast, had its first case at the end of January. However, after the first case appeared, the virus seemed to have disappeared for 3 weeks, and the pandemic did not break out until around February 20.
According to statistics in mid-April, for every 100,000 people, Washington State has 155 infections, while California has only 80, which is much lower than the number of infections in New York State with 1,248. Does this mean that the infectivity of category E is weak?
Ching-Yung Lin has yet to fine evidence from the genes; he believes that on the US East coast, people take the subway and can be easily infected; on the West coast, people drive more and maintain social distancing guidelines more consistently. "Commuting habits may have a more critical impact."
Family F: Spain, South Korea, Australia, China – across three continents
The F and E families traveled outside Wuhan and are homologous to the A2 virus strain that is closest to bats and pangolins. But E went East, F was scattered to South Korea and Australia. 45% of the virus in Spain came from here.
Family tree of 8 major Coronaviruses (as of 4/29/2020). Image source: Ching-Yung Lin
"The Three Lost Weeks" caused countries to lose control of situation
Looking back at Europe and the West Coast of the United States, the spread of Coronavirus were very similar: the first case appeared at the end of January, and after three weeks of silence, another strain of virus appeared with a mutation, and a pandemic broke out around February 20.
Those three weeks were the critical period," Ching-Yung Lin now looks back. At that time, the focus of Europe and the United States was to prevent Chinese tourists, but missed what had crossed the boarders and because there was no detection for 3 weeks, when and where did the virus mutate is hard to trace back.
GISAID also stated that due to insufficient data at the early stage of the pandemic, it is impossible to interpret the history of the global transmission in detail. At first glance, the directly related cases may be connected to the cases of other countries after accumulating more information.
"It may be that the virus is mutated after it reaches the local area, or it may be mutated and then spread to the local area," Ching-Yung Lin gave an example. The virus that caused the infection of Northern Italy did not necessarily come from Munich; because the same virus in Munich was also detected in Shanghai at the same time Therefore, "variation may also come from unknown travelers who have been to China and then to Italy". The propagation route is more complicated than expected.
Since March, the European and American countries that had come to realize the severity of their situation finally closed the national borders, causing the Coronavirus to spread from transcontinental to domestic communities, and produce various regional variations.
For example, in mid-March, Wales of the United Kingdom uploaded hundreds of local viruses, and Iceland also uploaded 41 local viruses. With the increase of virus strains uploaded by various countries, the characteristics of viruses in various regions are becoming more and more clear, and can be used as a reference for vaccine research and development.
"Even if the pandemic subsides, it may be passed back from the animal"
A hundred days later, the global pandemic has slowed down. But will the new coronavirus completely disappear from human life? The answer is: no.
"It is very difficult to completely calm the pandemic. Even if it subsides, it may be transmitted back from animals," Ching-Yung Lin gave an example. Humans once thought that SARS was eliminated. Unexpectedly, 17 years later, the mutated virus appeared again from the bat, causing more serious shock, "Who knows if there will be another next time?"
Therefore, after the spread of the COVID-19, scholars immediately began to study which animals in close contact with humans may be affected. From February onwards, some papers named cats and pigs have a high risk; in April, tigers at the Bronx Zoo in New York also detected the new coronavirus, confirming the previous findings.
In order to understand the virus mechanism, Ching-Yung Lin read more than 60 papers, five or six of which were from Wuhan Institute of Virology, Chinese Academy of Sciences. "After reading research papers, I often concealed my exasperation," he admired China's extensive research on Coronaviruses after SARS, which made the world more aware of the mechanism of these viruses and brought hope to the development of new drugs and vaccines.
However, some of the thesis topics also shocked him.
In 2007, a team of well-known Chinese scholars, Shi Zhengli, transformed a SARS-like coronavirus on bats that would not infect humans, and infected human cells in petri dishes; in 2011, a virus highly homologous to SARS was found in a bat cave in Yunnan, which proved the virus may directly infect humans from bats without intermediate hosts; in 2014, she and the University of North Carolina at Chapel Hill team synthesized a new SARS virus to successfully infect mice, and then experimented with primates. Another western team found six new coronaviruses in a bat in a Burmese cave.
"Originally, the viruses in the species, each walking through its own path. It should be acknowledged that humans have opened Pandora's box," Ching-Yung Lin made a conclusion for his research over the past hundred days, and also reminded all scientists.
Source: Original story in Chinese https://futurecity.cw.com.tw/article/1403