NCBI Virus SARS-CoV-2 Data Alignment

0

Entering edit mode

3.0 years ago

Alex ▴ 10

Hello,

I'm currently doing a small bioinformatics project where I'm downloading multiple FASTA files from NCBI Virus and want to locate the spike glycoprotein-encoding locus on each of the samples.

I filtered by nucleotide completeness but I noticed that the genome sequence lengths are variable w.r.t. each other as well as the reference genome for this taxon.

Because of this, I'm not sure if simply taking the start and end locations of the spike glycoprotein-encoding locus on the GFF file will be in the correct reading frame, or will even correspond to the target gene even if it is.

Will this work, and if not, will I have to do some alignment? And if I do have to align my samples to the ref genome, is there a less computationally-intensive way I can do it, such as through Google Colab, or would I need to do this on a desktop?

Thank you!

Virus GFF SARS-CoV-2 NCBI FASTA Alignment • 822 views

ADD COMMENT • link updated 3.0 years ago by manaswwm ▴ 570 • written 3.0 years ago by Alex ▴ 10

0

Entering edit mode

have you tried directly blasting the protein sequence against the Betacoronavirus database in NCBI blast, so a tblastn if you directly want to search with the amino acid sequence?

ADD REPLY • link 3.0 years ago by manaswwm ▴ 570

Login before adding your answer.