Welcome Why is it that when downloading a Fasta file for a specific SARS-CoV-2 genome from the NCBI website, it is DNA and not RNA? Is there an agreement for that?
Welcome Why is it that when downloading a Fasta file for a specific SARS-CoV-2 genome from the NCBI website, it is DNA and not RNA? Is there an agreement for that?
I don't have a link for the rationale explained,
but most RNA sequences in databases are represented as DNA,
makes life a lot easier when comparing and matching RNA, DNA and protein sequences.
Why does the FASTA sequence for coronavirus look like DNA, not RNA?
The reason is simple, we never sequence directly from RNA because RNA is too unstable and easily degraded by RNase. Instead the genome is reverse transcribed, either by targeted reverse transcription or random amplification and thus converted to cDNA. cDNA is stable and is essentially reverse transcribed RNA.
The cDNA is either sequenced directly or further amplified by PCR and then sequenced. Hence the sequence we observe is the cDNA rather than RNA, thus we observe thymine rather than uracil and that is how it is reported.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
still, I'd say is not quite a fully satisfying explanation,
at the end of sequencing we could just as well transcribe it back to RNA since we do know that the original product was RNA, the rest is just protocol, we are not measuring bases either, but fluoresence etc. yet we are not calling it red
While I am not 100% certain, GenBank contains only DNA sequences. So even for genomes that are RNA the sequence is represented as DNA counterpart. e.g. Chikungunya virus.