Using the coronavirus genome in lessons
Friday 3 July 2020
One of the amazing things about the outbreak of Coronavirus at the end of 2019 was the speed with which the Chinese authorities sequenced and published the whole genome of the virus.
The genome can be seen on the NCBI website and contains 29903 nucleotide bases which code for ten structural proteins.
In fact there are three institutions which share information and host the genome database data. It is a good example of international collaboration.
- DNA Data Bank of Japan, Mishima, Japan.
- EMBL-EBI, European Nucleotide Archive, Cambridge, UK.
- GenBank, NCBI, Bethesda, MD, USA.
There are several features worth looking at with IB students:
- The length of the genome in base pairs
- The locus and size of each gene in the genome
- The names of the proteins coded for in each of the genes
If you are looking at the NCBI database for the first time it can be confusing to navigate. In this post I hope to introduce a few features which might give you the confidence to use them in lessons.
This is the NCBI database entry for SARS-covid-19: https://www.ncbi.nlm.nih.gov/nuccore/MN908947#comment_MN908947.3 I
I have labelled some of the interesting information and there is a short video walk through below.
Here are the first 1000 bases of the Coronavirus genome.
It's not very interesting presented like this, although it does show us that there are four bases, A, T, C and G.
This begs the first question, "why is thymine in this sequence, isn't the genome of COVID-91 single stranded RNA?"
It is a tradition in genome databases to represent sequences in DNA form. Viral RNA is sequenced by reverse transcription of the RNA making it into cDNA. This is a process which happens in infected cells as part of the virus 'lifecycle'. So there is logic in this tradition.
The first 1000 bases of the Coronavirus genome (as cDNA) 1 attaaaggtt tataccttcc caggtaacaa accaaccaac tttcgatctc ttgtagatct 61 gttctctaaa cgaactttaa aatctgtgtg gctgtcactc ggctgcatgc ttagtgcact 121 cacgcagtat aattaataac taattactgt cgttgacagg acacgagtaa ctcgtctatc 181 ttctgcaggc tgcttacggt ttcgtccgtg ttgcagccga tcatcagcac atctaggttt 241 cgtccgggtg tgaccgaaag gtaagatgga gagccttgtc cctggtttca acgagaaaac 301 acacgtccaa ctcagtttgc ctgttttaca ggttcgcgac gtgctcgtac gtggctttgg 361 agactccgtg gaggaggtct tatcagaggc acgtcaacat cttaaagatg gcacttgtgg 421 cttagtagaa gttgaaaaag gcgttttgcc tcaacttgaa cagccctatg tgttcatcaa 481 acgttcggat gctcgaactg cacctcatgg tcatgttatg gttgagctgg tagcagaact 541 cgaaggcatt cagtacggtc gtagtggtga gacacttggt gtccttgtcc ctcatgtggg 601 cgaaatacca gtggcttacc gcaaggttct tcttcgtaag aacggtaata aaggagctgg 661 tggccatagt tacggcgccg atctaaagtc atttgactta ggcgacgagc ttggcactga 721 tccttatgaa gattttcaag aaaactggaa cactaaacat agcagtggtg ttacccgtga 781 actcatgcgt gagcttaacg gaggggcata cactcgctat gtcgataaca acttctgtgg 841 ccctgatggc taccctcttg agtgcattaa agaccttcta gcacgtgctg gtaaagcttc 901 atgcactttg tccgaacaac tggactttat tgacactaag aggggtgtat actgctgccg 961 tgaacatgag catgaaattg cttggtacac ggaacgttct gaaaagagct atgaattgca
Identifying genes in the genome
So we can use the SARS-Covid-19 genome to see what a DNA sequence looks like. This is a nice way to make the concept of a genome more concrete. To really understand a genome it's far better to use the NCBI database to help identify the known genes in the SARS covid 19 genome, thereby illustrating the difference between genome and gene in a concrete example, small enough to fit into the screen of a computer.
Watch this short video which gives a 2 minute guided tour of the NCBI database information for the SARS-Covid-19 genome and it's ten genes. It is a nice way to illustrate the meaning of the terms, "Genome" and "Gene".
Other ideas for using the NCBI database
With graphics from Protein databases this genome could be used to explain how genes in a genome become proteins in a virus structure?
https://ten.info/wp-content/uploads/COVID-19-Corona-Virus-SARS-CoV2-Coronavirus-Structure-1.png
https://www.raybiotech.com/covid19-proteins/
There is the potential to show how to look up what the proteins look like and how to find their functions.
https://www.ncbi.nlm.nih.gov/Structure/pdb/7BYR It might be possible to finish with a summary diagram
and to revise protein structure.
or to consider the nature of different alleles of a gene, using the S-gene
https://www.raybiotech.com/covid-19-spike-protein-variants/
Note: This could also be a short revision moment about the concept of 1 gene 1 protein, because there are some genes in this example which don't quite follow that rule.