Entering edit mode
3.9 years ago
2001linana
▴
40
Hi. I downloaded a sequences data file (of size 2.7 GB) from this link: https://www.covid19dataportal.org/sequences?db=embl-covid19. It is a .txt file and the lines for the first item/sequence is as the following:
ID MW281864; SV 1; linear; genomic RNA; STD; VRL; 29871 BP.
XX
AC MW281864;
XX
DT 24-NOV-2020 (Rel. 144, Created)
DT 11-DEC-2020 (Rel. 144, Last updated, Version 2)
XX
DE Severe acute respiratory syndrome coronavirus 2 isolate
DE SARS-CoV-2/human/West Bank/Jericho_SARS-CoV-2/2020, complete genome.
XX
KW .
XX
OS Severe acute respiratory syndrome coronavirus 2
OC Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes;
OC Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae;
OC Betacoronavirus; Sarbecovirus.
XX
RN [1]
RP 1-29871
RA Nasereddin A., Ereqat S., Al-Jawabreh A.;
RT "Genetic epidemiology of severe acute respiratory syndrome coronavirus 2 in
RT Palestine";
RL Unpublished.
XX
RN [2]
RP 1-29871
RA Nasereddin A., Ereqat S., Al-Jawabreh A.;
RT ;
RL Submitted (21-NOV-2020) to the INSDC.
RL Al-Quds Nutrition and Health Research Institute, Al-Quds University,
RL Abudeis, Jerusalem 91220, Palestine
XX
DR MD5; 32ad2f322f6c67d3d001cad2f292e154.
XX
CC ##Assembly-Data-START##
CC Assembly Method :: GALAXY v. 19.09.rc1
CC Sequencing Technology :: Illumina
CC ##Assembly-Data-END##
XX
FH Key Location/Qualifiers
FH
FT source 1..29871
FT /organism="Severe acute respiratory syndrome coronavirus 2"
FT /host="Homo sapiens"
FT /isolate="SARS-CoV-2/human/West
FT Bank/Jericho_SARS-CoV-2/2020"
FT /mol_type="genomic RNA"
FT /country="West Bank:Jericho"
FT /isolation_source="nasal swab"
FT /collection_date="2020-11-07"
FT /db_xref="taxon:2697049"
FT gene 237..21526
FT /gene="ORF1ab"
FT CDS join(237..13439,13439..21526)
FT /codon_start=1
FT /ribosomal_slippage
FT /gene="ORF1ab"
FT /product="ORF1ab polyprotein"
FT /protein_id="QPG02368.1"
FT /translation="MESLVPGF......"
FT CDS 237..13454
FT /codon_start=1
FT /gene="ORF1ab"
FT /product="ORF1a polyprotein"
FT /protein_id="QPG02369.1"
FT /translation="MESLVPGF......"
FT mat_peptide 237..776
FT /gene="ORF1ab"
FT /product="leader protein"
FT mat_peptide 777..2690
FT /gene="ORF1ab"
FT /product="nsp2"
FT mat_peptide 2691..8525
FT /gene="ORF1ab"
FT /product="nsp3"
FT mat_peptide 8526..10025
FT /gene="ORF1ab"
FT /product="nsp4"
FT mat_peptide 10026..10943
FT /gene="ORF1ab"
FT /product="3C-like proteinase"
FT mat_peptide 10944..11813
FT /gene="ORF1ab"
FT /product="nsp6"
FT mat_peptide 11814..12062
FT /gene="ORF1ab"
FT /product="nsp7"
FT mat_peptide 12063..12656
FT /gene="ORF1ab"
FT /product="nsp8"
FT mat_peptide 12657..12995
FT /gene="ORF1ab"
FT /product="nsp9"
FT mat_peptide 12996..13412
FT /gene="ORF1ab"
FT /product="nsp10"
FT mat_peptide join(13413..13439,13439..16207)
FT /gene="ORF1ab"
FT /product="RNA-dependent RNA polymerase"
FT mat_peptide 13413..13451
FT /gene="ORF1ab"
FT /product="nsp11"
FT stem_loop 13447..13474
FT /gene="ORF1ab"
FT /note="Coronavirus frameshifting stimulation element
FT stem-loop 1"
FT stem_loop 13459..13513
FT /gene="ORF1ab"
FT /note="Coronavirus frameshifting stimulation element
FT stem-loop 2"
FT mat_peptide 16208..18010
FT /gene="ORF1ab"
FT /product="helicase"
FT mat_peptide 18011..19591
FT /gene="ORF1ab"
FT /product="3'-to-5' exonuclease"
FT mat_peptide 19592..20629
FT /gene="ORF1ab"
FT /product="endoRNAse"
FT mat_peptide 20630..21523
FT /gene="ORF1ab"
FT /product="2'-O-ribose methyltransferase"
FT gene 21534..25355
FT /gene="S"
FT CDS 21534..25355
FT /codon_start=1
FT /gene="S"
FT /product="surface glycoprotein"
FT /protein_id="QPG02370.1"
FT /translation="MFVFLVLLPLVS......"
FT gene 25364..26191
FT /gene="ORF3a"
FT CDS 25364..26191
FT /codon_start=1
FT /gene="ORF3a"
FT /product="ORF3a protein"
FT /protein_id="QPG02371.1"
FT /translation="MDLFMRIFTIG......"
FT gene 26216..26443
FT /gene="E"
FT CDS 26216..26443
FT /codon_start=1
FT /gene="E"
FT /product="envelope protein"
FT /protein_id="QPG02372.1"
FT /translation="MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCN
FT IVNVSLVKPSFYVYSRVKNLNSSRVPDLLV"
FT gene 26494..27162
FT /gene="M"
FT CDS 26494..27162
FT /codon_start=1
FT /gene="M"
FT /product="membrane glycoprotein"
FT /protein_id="QPG02373.1"
FT /translation="MADSNGT......"
FT gene 27173..27358
FT /gene="ORF6"
FT CDS 27173..27358
FT /codon_start=1
FT /gene="ORF6"
FT /product="ORF6 protein"
FT /protein_id="QPG02374.1"
FT /translation="MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLT
FT ENKYSQLDEEQPMEID"
FT gene 27365..27730
FT /gene="ORF7a"
FT CDS 27365..27730
FT /codon_start=1
FT /gene="ORF7a"
FT /product="ORF7a protein"
FT /protein_id="QPG02375.1"
FT /translation="MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSP
FT FHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIV
FT AAIVFITLCFTLKRKTE"
FT gene 27727..27858
FT /gene="ORF7b"
FT CDS 27727..27858
FT /codon_start=1
FT /gene="ORF7b"
FT /product="ORF7b"
FT /protein_id="QPG02376.1"
FT /translation="MIELSLIDFYLCFLAFLLFLVLIMLIIFWFSLELQDHNETCHA"
FT gene 27865..28230
FT /gene="ORF8"
FT CDS 27865..28230
FT /codon_start=1
FT /gene="ORF8"
FT /product="ORF8 protein"
FT /protein_id="QPG02377.1"
FT /translation="MKFLVFLGIIKTVAAFHQECSLQSCTQHQPYVVDDPCPIHFYSKW
FT YIRVGARKSAPLIELCVDEAGSKSPIQYIDIGNYTVSCLPFTINCQEPKLGSLVVRCSF
FT YEDFLEYHDVRVVLDFI"
FT gene 28245..29504
FT /gene="N"
FT CDS 28245..29504
FT /codon_start=1
FT /gene="N"
FT /product="nucleocapsid phosphoprotein"
FT /protein_id="QPG02378.1"
FT /translation="MSDNGPQN......"
FT gene 29529..29645
FT /gene="ORF10"
FT CDS 29529..29645
FT /codon_start=1
FT /gene="ORF10"
FT /product="ORF10 protein"
FT /protein_id="QPG02379.1"
FT /translation="MGYINVFAFPFTIYSLLLCRMNSRNYIAQVDVVNFNLT"
FT stem_loop 29580..29615
FT /gene="ORF10"
FT /note="Coronavirus 3' UTR pseudoknot stem-loop 1"
FT stem_loop 29600..29628
FT /gene="ORF10"
FT /note="Coronavirus 3' UTR pseudoknot stem-loop 2"
FT stem_loop 29699..29739
FT /note="Coronavirus 3' stem-loop II-like motif (s2m)"
XX
SQ Sequence 29871 BP; 8945 A; 5480 C; 5849 G; 9597 T; 0 other;
aaccaaccaa ctttcgatct cttgtagatc tgttctctaa acgaacttta aaatctgtgt 60
ggctgtcact cggctgcatg cttagtgcac tcacgcagta taattaataa ctaattactg 120
tcgttgacag gacacgagta actcgtctat cttctgcagg ctgcttacgg tttcgtccgt 180
gttgcagccg atcatcagca ......
I was wondering, could anyone please explain a bit of this piece of data? Many thanks for your time and
attention.
One more thing, when I check the data from the link, https://www.covid19dataportal.org/sequences?db=embl-covid19, looks like it is in a table/tabular format. While on the other hand, when I download it from that provided link, it is in a txt file. I was wondering, is there any other way to obtain the data in a slight different format, rather than a txt file?