Two Candidates re SARS-2 Origin.
Zhiyan-Le, 2021-04-29.

https://zhiyanleback.blogspot.com/p/two-candidates-re-sars-2-origin.html
https://sites.google.com/site/zhiyanleback/2021-1/z20210429-two-origins

Key Words: Science and research communications, SARS-2, Covid-19.


There are two possible candidates regarding the SARS-2 origin. One is bat RaTG13, as the WHO-China joint report said. The other is WIV1 series (Wuhan Institute of Virology, No.1 virus. Totally, it has 8 bat samples, plus WIV16). The following are key info/data suggesting that RaTG13 is too good to be true as the SARS-2 origin and that WIV1 is the real origin.

Samples for this essay:

 

Samples, Raw Data Souce: NIH GenBank.
NameGenBank IDDescriptions
WH-01NC_045512.1Basic sample, Wuhan patient.
URL:https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.1
WIV1KF367457.1Bat SARS-like coronavirus WIV1, lab product by PRC WIV.
URL:https://www.ncbi.nlm.nih.gov/nucleotide/KF367457.1
RaTG13MN996532.1Bat coronavirus RaTG13, said collected in PRC Yunnan, 2013.
URL:https://www.ncbi.nlm.nih.gov/nuccore/MN996532.1
Bat-12NC_028824.1Bat coronavirus, collected in PRC Yunnan, 2012.
URL:https://www.ncbi.nlm.nih.gov/nuccore/971746735
Bat-14NC_030886.1Bat coronavirus, collected in PRC Yunnan, 2014.
URL:https://www.ncbi.nlm.nih.gov/nuccore/NC_030886.1


Note: If RaTG13 were really a natural bat sample, its nCov should behave as the same, or at least very close to, how the other two natural bat samples do.

And here is the alignment on single-gene level:
 

Global Align re Samples (Raw data and Method: NIH BLAST, by 2021-02)
Query x SubjectS-GeneComplete Sequence
IdentitiesGapsIdentitiesGaps
RaTG13 x NC-04551298%0%96%0%
RaTG13 x WIV188%2%78%3%
RaTG13 x Bat-201257%19%55%16%
RaTG13 x Bat-201461%13%56%12%
NC-045512 x Bat-201257%19%55%16%
NC-045512 x Bat-201461%13%56%12%
WIV1 x NC-04551287%2%78%3%


Taking WH-01(NC_045512) as the basic sample, RaTG13 has highest identities, and much higher than that of two natural bat sample (2012 and 2014). Question: in the same location (PRC, Yunnan Province) and in just one year, how come in nature that the nCov in RaTG13 could be mutated so close to a patient sample which will be coming over 6.5 years later? However, a lab can do it in 10 days.

And here are similarities on Codon-level:
 

S-Gene: Codon-Leveled Similarities to NC-045512
 NC-045512WIV1RaTG13Bat-2012Bat-2014
Length by Codon12731260125612691132
Matching Score  786936694
Matching Ratio  0.06190.55180.05200.0830
ATG1417142129
TAA00000
TAG00000
TGA00000


Again, the similarity score of RaTG13 goes much, much higher than that of natural bat samples and, for the codon ATG, RaTG13 has it exactly as the same as what the Wuhan patient (NC_045512) has. Is this possible in natural mutations? Obviously not.


The RaTG13 sample has two versions: one is issued at 2020-03-24 and the new version is issued at 2020-11-24. And here is their aligned result by NIH Blast:
 

 Alignment of RaTG13: V.1 (2020-03-24) x V.2 (2020-11-24) 
v.1 (2020-03-24): https://www.ncbi.nlm.nih.gov/nuccore/MN996532.1
v.2 (2020-11-24): https://www.ncbi.nlm.nih.gov/nuccore/MN996532.2
Query Length: 29855 (v.1)
Subject Length: 29855 (v.2)
Identities: 29834/29870 (99%)
Query  1                     CTTTCCAGGTAACAAACCAACGAACTCTCGATCTCTTGTAGATCT  45
                             |||||||||||||||||||||||||||||||||||||||||||||
Sbjct  1      ATTAAAGGTTTATACCTTTCCAGGTAACAAACCAACGAACTCTCGATCTCTTGTAGATCT  60
 
Query  2086   GCTAACAAATATCTTTGGCACTGTCTATGAGCAACTCAAACCTGTTCTTGATTGGCTCGA  2145
              |||||||||||||||||||||||| |||||| ||||||||||||||||||||||||||||
Sbjct  2101   GCTAACAAATATCTTTGGCACTGTTTATGAGAAACTCAAACCTGTTCTTGATTGGCTCGA  2160
 
Query  6886   TTCATTTAATTACCTGAAGTCACCTAATTTTTTTACATTGATTAATATTATAATTTGGTT  6945
              ||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||
Sbjct  6901   TTCATTTAATTACCTGAAGTCACCTAATTTTTTTAAATTGATTAATATTATAATTTGGTT  6960
 
Query  7066   TACTAATGTCACTACAGCAATCTACTGTACTGGTTCTATACCTTGTGGTGTTTGTCTTAG  7125
              |||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||
Sbjct  7081   TACTAATGTCACTACAGCAATCTACTGTACTGGTTCTATACCTTGTAGTGTTTGTCTTAG  7140
 
Query  29806  GTGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAAAAAAAAAAAA  29855
              |||||||||||||||||||||||||||||||||||               
Sbjct  29821  GTGATTTTAATAGCTTCTTAGGAGAATGACAAAAA                 29855


That is blasted on single-gene level. It shows a 99% identity with a few allele changes and a 15-bp gap (ATTAAAGGTTTATAC). To summarize:
 

 RaTG13 v.1(Ref) and v.2 (Alt): Codon Changes
 BCHidden Codons BCHidden Codon ValuesBC-GainTotal Gain
IIIIIIIVVVI IIIIIIIVVVISUM
RefCTAGCACGTTGCACATGGTAC 293728585595026632-13
AltTTAAGCTCGTAGCATCGGCTA 61105551202729253
RefGCATGAGTCCTGAGACTTCAG 3757463193219231-424
AltGAAGTCGTCAGTCAGTCAGTC 33464612195346255
RefTACTCGTGAAGTCTCAGGACT 5055571230118223-118
AltTAATCGGTCCTGAGACTTCAG 4955463193219241
RefTGGCTACTAATCGCGATGCTA 59292914391529214-8141
AltTAGTCGTACTCAGAGTCCTGA 51555053355457355
RefCGCTACATGTGCAGGACACGT 2650155811528193537
AltCTGGCATACCATGTGCAACGT 31375020471728230


In the above, BC refers to Basic Codon, of which each has its own 6 hidden codons. Hidden codon value refers to energy level corresponding to each codon.

It shows that RaTG13 version 2 gained 207 units on gene-energy-level. Question is: How could the change and its energy gain naturally happen as the WIV leaders said that they have genomic sequence without live-sample regarding the bat RaTG13 nCov?

Regarding the 15-bp gap (
ATTAAAGGTTTATAC), here is a comparison:
 

AssesionNameGenomic Data
NC_045512.1WH01-1CGGTGACGCATACAA AACATTCCCACCATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT
NC_045512.2WH01-2ATTAAAGGTTTATAC CTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA
MT511081.1Poland+ATTAAAGGTTTATAC CTTTCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA
MN996532.1RaTG13-1                CTTTCCAGGTAACAAACCAACGAACTCTCGATCTCTTGTAGATCTGTTCTCTAAA
MN996532.2RaTG13-2ATTAAAGGTTTATAC CTTTCCAGGTAACAAACCAACGAACTCTCGATCTCTTGTAGATCTGTTCTCTAAA



The genomic data comes from each sample (Poland+, NIH Blast-Search result: hundreds 100% identical samples re patients around the world. Taking the first one, Poland.).

Question: where does the 15-bp gap come from? How does it go exactly the same 15-bp gap come to both RaTG13 (with no live-sample) and patients around the world in almost the same time around 2020 autumn?

There is only one convincing answer to the above questions, that is, RaTG13 nCov is a lab product made for similarities or identities as close as possible to that of patient samples so as to cover up the real SARS-2 origin, such as WIV1.

Regarding the changes for RaTG13 gene data, here is the relevant info by USRTK:

RATG13: ALTERED DATASETS RAISE MORE QUESTIONS ABOUT RELIABILITY OF KEY STUDIES ON CORONAVIRUS ORIGINS.
Posted on December 29, 2020.
NO PEER REVIEW FOR ADDENDUM TO PROMINENT CORONAVIRUS ORIGINS STUDY?
Posted on December 18, 2020
URL: https://usrtk.org/tag/ratg13/ 

It says that, when updating RaTG13 genomic sequence data, the provider, PRC-WIV, did not give a clear reason, nor did they go through peer-review and other normal processes.


Speaking of WIV1, its identities (87% to basic sample) should be good enough as the SARS-2 origin. The provider PRC-WIV said that it is a bat-nCov sample. However, in international meetings, it was reviewed and approved as a lab-product. For example:
 

The University of North Carolina, Institutional Biosafety Committee
Meeting Minutes,  Jan.09,2019 3:30 PM.  Burnett-Womack 9001
60350Infectious clones of bat SARS-like coronaviruses WIV1-CoV and SHC-014 (including reporter-expressing variants) or expressing WIV1 or SHC014 Spike genes - 2019. Renewal
APPROVEDSummary: The aim of this study is to generate reverse genetic infectious clones of bat SARS- like coronaviruses WIV1-CoV and SHC-014, which are genetically similar to. Additionally, to determine if the Spike proteins from these viruses are sufficient to confer infectivity, the Spike genes from the bat viruses will be introduced into the genome background. Replication of recombinant viruses will be monitored through viral passage in cells and infectious of mice.
Committee Comments: The proposed containment and safety procedures are adequate for the experimental design. Community Comments: None. III-D, BSL-3, plasmids, mice.
60351Ralph Baric
Transposon mutagenesis of WIV16-CoV to identify genetically flexible regions of CoV genomes
APPROVEDSummary: The aim of this experiment is to generate a transposon mutant library spanning the WIVI6-CoV genome. The virus library will be screened in cell culture for viral fitness via passage in cell lines. Additionally, the screen will be interrogated for genes responsible for interferon antagonism or RNA replication fidelity.
Committee Comments: The proposed containment and safety procedures are adequate for the experimental design. Community Comments: None III-D, BSL-3, plasmids
Link ::  https://2f7nhsvfj5dyz0312njuuj14-wpengine.netdna-ssl.com/wp-content/uploads/2020/08/For-Production-to-Requestor-IBC-Meeting-Minutes.pdf  


In the above re BP genes frequency and distribution on Codon level, WIV1 has the smallest distance from patient sample (NC_045512), while RaTG13 has three times bigger distance than that of WIV1 and obviously different from that of natural samples. See below:
 

Global Codon BP Freq.Distribution (raw data:NIH GenBank, by 2021-02)
 NC_045512WIV1RaTG13Bat-12Bat-14
A3038.332884.672975.332342.332552.33
C1868.002020.671836.671462.672135.67
G1983.672098.001948.671936.002419.00
T3267.003099.673190.333250.002946.00
total:10157101039951899110053
avrg:2539.252525.752487.752247.752513.25
total gap w NC_045512-54-206-1166-104
avrg gap w NC_045512-14-52-292-26


In more detailed terms, WIV1 and RaTG13 behave the same way:
 

SARS-2: Global Codon Frequency & Distribution (raw data: NIH GenBank, by 2021-02-22)
 NC_045512WIV1RaTG13Bat-12Bat-14
1st2nd3rd1st2nd3rd1st2nd3rd1st2nd3rd1st2nd3rd
A307529033137306128552738282830873011231620912620275425472356
C213216101862205121351876159918382073155513871446200022852122
G152519092517238916292276192124661459169218522264272920142514
T342537352641260234843213360325603408342836612661257032073061
total:10157101039951899110053
avrg:2539.252525.752487.752247.752513.25
align w NC_045512:0.994680.979720.885200.98976


In the above, 1st-2nd-3rd refer to three base genes. In terms of Codons frequency and distribution, which in fact the virus and ACE2 run and interact with each other, WIV1 has the highest score, 0.99468, to that of patient sample (NC_45512), while RaTG13 has clearly lower ration and obviously different from natural bat samples.

In sum, between the two candidates of SARS-2 origin, WIV1 is the real one; RaTG13 is not the SARS-2 origin, nor is it a natural sample.

In March 2016, PNAS (publication by US National Academy of Science) published an article with editor’s notes about the WIV1. Please see:
This Issue, by PNAS March 15, 2016 113 (11) 2793-2795;
https://www.pnas.org/content/113/11/2793 

Figure1

it warned that WIV1 is a lab nCov product, which can directly jump to humans and cause huge pandemic with global economic loses, can even change existing living styles. Now the warning is a big reality that the US and the whole world are facing.





RaTG13 Is Too Good To Be True.
---- WIV1 pk RaTG13 (2).
Zhiyan-Le, 2021-04-03.
https://sites.google.com/site/zhiyanleback/2021-1/z20210404-wiv1xratg13
https://zhiyanleback.blogspot.com/p/ratg13-is-too-good-to-be-true.html

WHO China trip and its SARS-2 origin report indicated RaTG13 as the SARS-2 origin from nature. Their conclusion was base on: 1]: RaTG13 has 96.2% identity, which was done on the single gene alignment level. 2]: There is only one Amino-Acid difference, which was done on the protein alignment level.

Such conclusion may not stand true. Reason: Alignment and analysis on Codon level are totally ignored, which may lead to different conclusions. Let’s take an example, we have two genomic sequences with the same Am-Acid but different Codons:
 

First:
ArgPheGluArgArgSerLeuGlySerSerArgProThrCysCys
AGGTTCGAGCGCCGGAGTCTCGGCTCATCCCGACCGACTTGCTGT
Second:
ArgPheGluArgArgSerLeuGlySerSerArgProThrCysCys
CGTTTTGAAAGGCGTTCCTTAGGATCCTCACGGCCCACCTGTTGC


Their identities on the Am-Acid or protein level is 100%. And their identities on the single gene alignment level is 56%. See below:
 

Identities  25/45(56%)
Query  1   AGGTTCGAGCGCCGGAGTCTCGGCTCATCCCGACCGACTTGCTGT  45
            | || ||  | ||     | || || || || || || || || 
Sbjct  1   CGTTTTGAAAGGCGTTCCTTAGGATCCTCACGGCCCACCTGTTGC  45


However, their identities on the Codon level: 0%. More over, their structures are different:



Which is accurate? In my view, the one on Codon level is. The reason is simple: When the virus and ACE2 are running, they work on the Codon level. The given samples may have the same Am-Acid, however, their energy level or power can be very different due to their different Codons. This can be well explained by the binary-image Codon Table and by the Buckyball system.

RaTG13 Reality on the Codon Level.

Now using Codon study to see the identities among relevant nCov samples (with GenBank ID):

• NC_045512: WH-01, basic sample, collected from a patient in Wuhan hospital in PRC.
• MN996532: bat RaTG13, said as the most possible origin of SARS-2.
• KF367457: WIV1, the first one of lab-product series of SARS-like nCov.
• NC_028824, Bat-2012, natural bat, collected in PRC Yunnan, 2012.
• NC_030886, Bat-2014. natural bat, collected in PRC Yunnan, 2014

If RaTG13 were a bat from nature in 2013, it should behave as the same or very closely to the two natural bat sample, Bat-2012 and Bat-2014.

Aligning with the basic sample, WH-01 (NC_45512), on the global Codon level, the RaTG13 has a matching score of 184 and WIV1 has 190. And here is the alignment of their Codon base-gene occurring frequency & distribution:

Table 01: 

Global Codon BP Freq.Distribution (raw data:NIH GenBank, by 2021-02)
 NC_045512WIV1RaTG13Bat-12Bat-14
A3038.332884.672975.332342.332552.33
C1868.002020.671836.671462.672135.67
G1983.672098.001948.671936.002419.00
T3267.003099.673190.333250.002946.00
total:10157101039951899110053
avrg:2539.252525.752487.752247.752513.25
total gap w NC_45512-54-206-1166-104
avrg gap w NC_45512-14-52-292-26


The aligning result: WIV1 has the total gap [-54] and average gap [-14] , and RaTG13 has that of [-206] and [-53]. Both Bat-2012 and Bat-2014 have much bigger gaps.

Clearly, WIV1 is the closest to the basic sample, meaning an obvious greater possibility to be the SARS-2 origin than that of RaTG13, which behaves very differently from natural bat samples.


Ratg13: Too Good To Be True As The SARS-2 Origin.

This indicator, Codon and its genetic frequencies and distribution, is important, according to PRC-PLA doctor Chen Wei (a top virology scientist and Covid-vaccine developer in China; also, she is a leader in charge of medical treatment during the early stage of the pandemic in PRC Wuhan City), because it directly tells the similarity or difference regarding affinity, stability and mutation status and trend, especially genes C & G and their quantity and distribution.

S-Gene is a key factor re SARS-2 interacts with human body. Let’s do some studies on the Codon level by borrowing Dr. Chen Wei suggestion. Below are similarities among the taken samples.

Fig. 01:


Indeed, as Dr. Chen Wei suggested, gene C & G, as well as their quantity and distribution, play an important role, particularly in the Codon 3rd BP genes (where base-gene A & T are 0).

It seems that RaTG13 is in a good position to be the SARS-2 origin: Regarding single gene-aligning closeness to the basic sample NC_045512, WIV1 has a ratio of 0.9866, and RaTG13 has a ratio of 0.9969. However, comparing with natural bats (Bat-2012 and Bat-2014), of which one has the ratio of 0.8892 and another has it of 1.0134, far enough to be role out as the SARS-2 origin by the said sample. That is, sample RaTG13 behaves very differently from natural bats, but it does not.

Further, below is the Codon-leveled aligning result:

Table 02:

Codon-Leveled Similarities to NC-045512
 WIV1RaTG13Bat-2012Bat-2014
Total Occurring1260125612691132
similarities786936694
Ratio of similarity/total0.06190.55180.05200.0830


The sample RaTG13 has a matching score of 693 occurring aligning similarities, others have it from 87 to 96. The gap between RaTG13 and natural bats is too big to believe that it comes from nature. In a lab field, however, it is pretty easy to reach or even to go beyond 693 similarities score. Besides, RaTG13 similarity ratio is too close to the basic sample but too far from the natural samples. In fact, when using NIH-BLAST to search RaTG13’s all possible similarities, the result has no natural bats but in three categories: synthetic construct, clones, and vaccines, all are lab/man-made work.

In sum, the indicator on Codon level should be in the must-do-list when searching the SARS-2 origin(s). By using it, the picture is very different from what the WHO report said. That is, the sample RaTG13 is too perfect to be a truthful SARS-2 origin from natural bats, rather, it is very likely a lab-product. In contrast, and by all indicators, particularly on the Codon level, WIV1 has the closest relations with basic sample NC_044512, that is, WIV1 is the most possible SARS-2 origin.

Reference:
Message from PLA Vaccine Patent. 2021-03-15.
https://sites.google.com/site/zhiyanleback/2021-1/z20210315-patent-message-en


Data Availability

Table 01-02 (supplement)

SARS-2: Global Codon Freq. & Distribution (raw data: NIH GenBank, by 2021-02-22)
 WH-01WIV1RaTG13Bat-12Bat-14 WH-01WIV1RaTG13Bat-12Bat-14
AAA284264312132180CAA230207243129157
AAC219186200129210CAC19215115298125
AAG252239116114213CAG18416584129183
AAT188233264150179CAT181163165125130
ACA276283235175219CCA10114111379150
ACC12713514589162CCC4443433175
ACG6252466089CCG2721302872
ACT165246247155239CCT9311510769142
AGA201149266137106CGA3627372833
AGC1098614192143CGC29443923100
AGG131108113124112CGG2429281840
AGT161143201146162CGT50796365108
ATA180168106191166CTA274215119243211
ATC1271311048381CTC120147847993
ATG298357122319311CTG27124879211203
ATT295281210220182CTT276256213200178
 WH-01WIV1RaTG13Bat-12Bat-14 WH-01WIV1RaTG13Bat-12Bat-14
GAA10019918291141TAA28910031924876
GAC7614513357143TAC201183252169214
GAG971787869161TAG211106123142102
GAT6417318579159TAT135163279230174
GCA99187118107191TCA193215183120130
GCC47786953131TCC7961726698
GCG29471958107TCG4063335055
GCT89261179114260TCT139187199133165
GGA74951246698TGA2588730525192
GGC49738962137TGC193150244183193
GGG5150504370TGG195110253208179
GGT62195155118226TGT286204358288215
GTA180153129198192TTA362248220425214
GTC701069991115TTC180157207141102
GTG27921297266315TTG366291188425302
GTT159237215220283TTT298277368349259


Table 02-02 (supplement)

S-Gene: Global Codon Matching Score (raw data: NIH GenBank, 2021-02-22)
 WIV1RaTG13Bat-12Bat-14 WIV1RaTG13Bat-12Bat-14
AAA 0000CAA 0000
AAC 35149CAC 0800
AAG 23100CAG 42412
AAT 0000CAT 0000
ACA 0000CCA 0000
ACC 757814CCC 23725
ACG 0000CCG 0000
ACT 0000CCT 0000
AGA 0000CGA 0000
AGC 1152118CGC 0000
AGG 22421CGG 0000
AGT 0000CGT 0000
ATA 0000CTA 0000
ATC 53533CTC 0000
ATG 1522CTG 759714
ATT 0000CTT 0000
 WIV1RaTG13Bat-12Bat-14 WIV1RaTG13Bat-12Bat-14
GAA 0000TAA 0000
GAC 23231TAC 23324
GAG 12201TAG 0000
GAT 0000TAT 0000
GCA 0000TCA 0000
GCC 73859TCC 0000
GCG 0000TCG 0000
GCT 0000TCT 0000
GGA 0000TGA 0000
GGC 44945TGC 52303
GGG 0000TGG 0710
GGT 0000TGT 0000
GTA 0000TTA 0000
GTC 0000TTC 54662
GTG 860511TTG 0000
GTT 0000TTT 0000















=-=

Comments