How to understand this piece of data from COVID19 data portal?
1
0
Entering edit mode
3.9 years ago
2001linana ▴ 40

Hi. I downloaded a sequences data file (of size 2.7 GB) from this link: https://www.covid19dataportal.org/sequences?db=embl-covid19. It is a .txt file and the lines for the first item/sequence is as the following:

ID   MW281864; SV 1; linear; genomic RNA; STD; VRL; 29871 BP.
XX
AC   MW281864;
XX
DT   24-NOV-2020 (Rel. 144, Created)
DT   11-DEC-2020 (Rel. 144, Last updated, Version 2)
XX
DE   Severe acute respiratory syndrome coronavirus 2 isolate
DE   SARS-CoV-2/human/West Bank/Jericho_SARS-CoV-2/2020, complete genome.
XX
KW   .
XX
OS   Severe acute respiratory syndrome coronavirus 2
OC   Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes;
OC   Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae;
OC   Betacoronavirus; Sarbecovirus.
XX
RN   [1]
RP   1-29871
RA   Nasereddin A., Ereqat S., Al-Jawabreh A.;
RT   "Genetic epidemiology of severe acute respiratory syndrome coronavirus 2 in
RT   Palestine";
RL   Unpublished.
XX
RN   [2]
RP   1-29871
RA   Nasereddin A., Ereqat S., Al-Jawabreh A.;
RT   ;
RL   Submitted (21-NOV-2020) to the INSDC.
RL   Al-Quds Nutrition and Health Research Institute, Al-Quds University,
RL   Abudeis, Jerusalem 91220, Palestine
XX
DR   MD5; 32ad2f322f6c67d3d001cad2f292e154.
XX
CC   ##Assembly-Data-START##
CC   Assembly Method       :: GALAXY v. 19.09.rc1
CC   Sequencing Technology :: Illumina
CC   ##Assembly-Data-END##
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..29871
FT                   /organism="Severe acute respiratory syndrome coronavirus 2"
FT                   /host="Homo sapiens"
FT                   /isolate="SARS-CoV-2/human/West
FT                   Bank/Jericho_SARS-CoV-2/2020"
FT                   /mol_type="genomic RNA"
FT                   /country="West Bank:Jericho"
FT                   /isolation_source="nasal swab"
FT                   /collection_date="2020-11-07"
FT                   /db_xref="taxon:2697049"
FT   gene            237..21526
FT                   /gene="ORF1ab"
FT   CDS             join(237..13439,13439..21526)
FT                   /codon_start=1
FT                   /ribosomal_slippage
FT                   /gene="ORF1ab"
FT                   /product="ORF1ab polyprotein"
FT                   /protein_id="QPG02368.1"
FT                   /translation="MESLVPGF......"
FT   CDS             237..13454
FT                   /codon_start=1
FT                   /gene="ORF1ab"
FT                   /product="ORF1a polyprotein"
FT                   /protein_id="QPG02369.1"
FT                   /translation="MESLVPGF......"
FT   mat_peptide     237..776
FT                   /gene="ORF1ab"
FT                   /product="leader protein"
FT   mat_peptide     777..2690
FT                   /gene="ORF1ab"
FT                   /product="nsp2"
FT   mat_peptide     2691..8525
FT                   /gene="ORF1ab"
FT                   /product="nsp3"
FT   mat_peptide     8526..10025
FT                   /gene="ORF1ab"
FT                   /product="nsp4"
FT   mat_peptide     10026..10943
FT                   /gene="ORF1ab"
FT                   /product="3C-like proteinase"
FT   mat_peptide     10944..11813
FT                   /gene="ORF1ab"
FT                   /product="nsp6"
FT   mat_peptide     11814..12062
FT                   /gene="ORF1ab"
FT                   /product="nsp7"
FT   mat_peptide     12063..12656
FT                   /gene="ORF1ab"
FT                   /product="nsp8"
FT   mat_peptide     12657..12995
FT                   /gene="ORF1ab"
FT                   /product="nsp9"
FT   mat_peptide     12996..13412
FT                   /gene="ORF1ab"
FT                   /product="nsp10"
FT   mat_peptide     join(13413..13439,13439..16207)
FT                   /gene="ORF1ab"
FT                   /product="RNA-dependent RNA polymerase"
FT   mat_peptide     13413..13451
FT                   /gene="ORF1ab"
FT                   /product="nsp11"
FT   stem_loop       13447..13474
FT                   /gene="ORF1ab"
FT                   /note="Coronavirus frameshifting stimulation element
FT                   stem-loop 1"
FT   stem_loop       13459..13513
FT                   /gene="ORF1ab"
FT                   /note="Coronavirus frameshifting stimulation element
FT                   stem-loop 2"
FT   mat_peptide     16208..18010
FT                   /gene="ORF1ab"
FT                   /product="helicase"
FT   mat_peptide     18011..19591
FT                   /gene="ORF1ab"
FT                   /product="3'-to-5' exonuclease"
FT   mat_peptide     19592..20629
FT                   /gene="ORF1ab"
FT                   /product="endoRNAse"
FT   mat_peptide     20630..21523
FT                   /gene="ORF1ab"
FT                   /product="2'-O-ribose methyltransferase"
FT   gene            21534..25355
FT                   /gene="S"
FT   CDS             21534..25355
FT                   /codon_start=1
FT                   /gene="S"
FT                   /product="surface glycoprotein"
FT                   /protein_id="QPG02370.1"
FT                   /translation="MFVFLVLLPLVS......"
FT   gene            25364..26191
FT                   /gene="ORF3a"
FT   CDS             25364..26191
FT                   /codon_start=1
FT                   /gene="ORF3a"
FT                   /product="ORF3a protein"
FT                   /protein_id="QPG02371.1"
FT                   /translation="MDLFMRIFTIG......"
FT   gene            26216..26443
FT                   /gene="E"
FT   CDS             26216..26443
FT                   /codon_start=1
FT                   /gene="E"
FT                   /product="envelope protein"
FT                   /protein_id="QPG02372.1"
FT                   /translation="MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCN
FT                   IVNVSLVKPSFYVYSRVKNLNSSRVPDLLV"
FT   gene            26494..27162
FT                   /gene="M"
FT   CDS             26494..27162
FT                   /codon_start=1
FT                   /gene="M"
FT                   /product="membrane glycoprotein"
FT                   /protein_id="QPG02373.1"
FT                   /translation="MADSNGT......"
FT   gene            27173..27358
FT                   /gene="ORF6"
FT   CDS             27173..27358
FT                   /codon_start=1
FT                   /gene="ORF6"
FT                   /product="ORF6 protein"
FT                   /protein_id="QPG02374.1"
FT                   /translation="MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLT
FT                   ENKYSQLDEEQPMEID"
FT   gene            27365..27730
FT                   /gene="ORF7a"
FT   CDS             27365..27730
FT                   /codon_start=1
FT                   /gene="ORF7a"
FT                   /product="ORF7a protein"
FT                   /protein_id="QPG02375.1"
FT                   /translation="MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSP
FT                   FHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIV
FT                   AAIVFITLCFTLKRKTE"
FT   gene            27727..27858
FT                   /gene="ORF7b"
FT   CDS             27727..27858
FT                   /codon_start=1
FT                   /gene="ORF7b"
FT                   /product="ORF7b"
FT                   /protein_id="QPG02376.1"
FT                   /translation="MIELSLIDFYLCFLAFLLFLVLIMLIIFWFSLELQDHNETCHA"
FT   gene            27865..28230
FT                   /gene="ORF8"
FT   CDS             27865..28230
FT                   /codon_start=1
FT                   /gene="ORF8"
FT                   /product="ORF8 protein"
FT                   /protein_id="QPG02377.1"
FT                   /translation="MKFLVFLGIIKTVAAFHQECSLQSCTQHQPYVVDDPCPIHFYSKW
FT                   YIRVGARKSAPLIELCVDEAGSKSPIQYIDIGNYTVSCLPFTINCQEPKLGSLVVRCSF
FT                   YEDFLEYHDVRVVLDFI"
FT   gene            28245..29504
FT                   /gene="N"
FT   CDS             28245..29504
FT                   /codon_start=1
FT                   /gene="N"
FT                   /product="nucleocapsid phosphoprotein"
FT                   /protein_id="QPG02378.1"
FT                   /translation="MSDNGPQN......"
FT   gene            29529..29645
FT                   /gene="ORF10"
FT   CDS             29529..29645
FT                   /codon_start=1
FT                   /gene="ORF10"
FT                   /product="ORF10 protein"
FT                   /protein_id="QPG02379.1"
FT                   /translation="MGYINVFAFPFTIYSLLLCRMNSRNYIAQVDVVNFNLT"
FT   stem_loop       29580..29615
FT                   /gene="ORF10"
FT                   /note="Coronavirus 3' UTR pseudoknot stem-loop 1"
FT   stem_loop       29600..29628
FT                   /gene="ORF10"
FT                   /note="Coronavirus 3' UTR pseudoknot stem-loop 2"
FT   stem_loop       29699..29739
FT                   /note="Coronavirus 3' stem-loop II-like motif (s2m)"
XX
SQ   Sequence 29871 BP; 8945 A; 5480 C; 5849 G; 9597 T; 0 other;
     aaccaaccaa ctttcgatct cttgtagatc tgttctctaa acgaacttta aaatctgtgt        60
     ggctgtcact cggctgcatg cttagtgcac tcacgcagta taattaataa ctaattactg       120
     tcgttgacag gacacgagta actcgtctat cttctgcagg ctgcttacgg tttcgtccgt       180
     gttgcagccg atcatcagca ......
 I was wondering, could anyone please explain a bit of this piece of data? Many thanks for your time and
 attention.
sequence • 744 views
ADD COMMENT
0
Entering edit mode

One more thing, when I check the data from the link, https://www.covid19dataportal.org/sequences?db=embl-covid19, looks like it is in a table/tabular format. While on the other hand, when I download it from that provided link, it is in a txt file. I was wondering, is there any other way to obtain the data in a slight different format, rather than a txt file?

ADD REPLY
0
Entering edit mode
3.9 years ago
GenoMax 147k

This is UniProt (Swiss-Prot) data format.

ADD COMMENT

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6