Using the coronavirus genome in lessons

Friday 3 July 2020

One of the amazing things about the outbreak of Coronavirus at the end of 2019 was the speed with which the Chinese authorities sequenced and published the whole genome of the virus.

The genome can be seen on the NCBI website and contains 29903 nucleotide bases which code for ten structural proteins. 
In fact there are three institutions which share information and host the genome database data. It is a good example of international collaboration.

  • DNA Data Bank of Japan, Mishima, Japan.
  • EMBL-EBI, European Nucleotide Archive, Cambridge, UK.
  • GenBank, NCBI, Bethesda, MD, USA.

There are several features worth looking at with IB students:

  • The length of the genome in base pairs
  • The locus and size of each gene in the genome
  • The names of the proteins coded for in each of the genes

If you are looking at the NCBI database for the first time it can be confusing to navigate. In this post I hope to introduce a few features which might give you the confidence to use them in lessons.

This is the NCBI database entry for SARS-covid-19: https://www.ncbi.nlm.nih.gov/nuccore/MN908947#comment_MN908947.3 I

I have labelled some of the interesting information and there is a short video walk through below.

Here are the first 1000 bases of the Coronavirus genome.

It's not very interesting presented like this, although it does show us that there are four bases, A, T, C and G.
This begs the first question, "why is thymine in this sequence, isn't the genome of COVID-91 single stranded RNA?"

It is a tradition in genome databases to represent sequences in DNA form.  Viral RNA is sequenced by reverse transcription of the RNA making it into cDNA. This is a process which happens in infected cells as part of the virus 'lifecycle'. So there is logic in this tradition.

The first 1000 bases of the Coronavirus genome (as cDNA)

        1 attaaaggtt tataccttcc caggtaacaa accaaccaac tttcgatctc ttgtagatct
       61 gttctctaaa cgaactttaa aatctgtgtg gctgtcactc ggctgcatgc ttagtgcact
      121 cacgcagtat aattaataac taattactgt cgttgacagg acacgagtaa ctcgtctatc
      181 ttctgcaggc tgcttacggt ttcgtccgtg ttgcagccga tcatcagcac atctaggttt
      241 cgtccgggtg tgaccgaaag gtaagatgga gagccttgtc cctggtttca acgagaaaac
      301 acacgtccaa ctcagtttgc ctgttttaca ggttcgcgac gtgctcgtac gtggctttgg
      361 agactccgtg gaggaggtct tatcagaggc acgtcaacat cttaaagatg gcacttgtgg
      421 cttagtagaa gttgaaaaag gcgttttgcc tcaacttgaa cagccctatg tgttcatcaa
      481 acgttcggat gctcgaactg cacctcatgg tcatgttatg gttgagctgg tagcagaact
      541 cgaaggcatt cagtacggtc gtagtggtga gacacttggt gtccttgtcc ctcatgtggg
      601 cgaaatacca gtggcttacc gcaaggttct tcttcgtaag aacggtaata aaggagctgg
      661 tggccatagt tacggcgccg atctaaagtc atttgactta ggcgacgagc ttggcactga
      721 tccttatgaa gattttcaag aaaactggaa cactaaacat agcagtggtg ttacccgtga
      781 actcatgcgt gagcttaacg gaggggcata cactcgctat gtcgataaca acttctgtgg
      841 ccctgatggc taccctcttg agtgcattaa agaccttcta gcacgtgctg gtaaagcttc
      901 atgcactttg tccgaacaac tggactttat tgacactaag aggggtgtat actgctgccg
      961 tgaacatgag catgaaattg cttggtacac ggaacgttct gaaaagagct atgaattgca

Identifying genes in the genome

So we can use the SARS-Covid-19 genome to see what a DNA sequence looks like. This is a nice way to make the concept of a genome more concrete. To really understand a genome it's far better to use the NCBI database to help identify the known genes in the SARS covid 19 genome, thereby illustrating the difference between genome and gene in a concrete example, small enough to fit into the screen of a computer.

Watch this short video which gives a 2 minute guided tour of the NCBI database information for the SARS-Covid-19 genome and it's ten genes. It is a nice way to illustrate the meaning of the terms, "Genome" and "Gene".

Other ideas for using the NCBI database

With graphics from Protein databases this genome could be used to explain how genes in a genome become proteins in a virus structure? 
https://ten.info/wp-content/uploads/COVID-19-Corona-Virus-SARS-CoV2-Coronavirus-Structure-1.png
https://www.raybiotech.com/covid19-proteins/

There is the potential to show how to look up what the proteins look like and how to find their functions.  
https://www.ncbi.nlm.nih.gov/Structure/pdb/7BYR  It might be possible to finish with a summary diagram

and to revise protein structure.

or to consider the nature of different alleles of a gene, using the S-gene
https://www.raybiotech.com/covid-19-spike-protein-variants/

Note: This could also be a short revision moment about the concept of 1 gene 1 protein, because there are some genes in this example which don't quite follow that rule.