NCBI Virus SARS-CoV-2 Data Alignment
0
0
Entering edit mode
2.5 years ago
Alex ▴ 10

Hello,

I'm currently doing a small bioinformatics project where I'm downloading multiple FASTA files from NCBI Virus and want to locate the spike glycoprotein-encoding locus on each of the samples.

I filtered by nucleotide completeness but I noticed that the genome sequence lengths are variable w.r.t. each other as well as the reference genome for this taxon.

Because of this, I'm not sure if simply taking the start and end locations of the spike glycoprotein-encoding locus on the GFF file will be in the correct reading frame, or will even correspond to the target gene even if it is.

Will this work, and if not, will I have to do some alignment? And if I do have to align my samples to the ref genome, is there a less computationally-intensive way I can do it, such as through Google Colab, or would I need to do this on a desktop?

Thank you!

Virus GFF SARS-CoV-2 NCBI FASTA Alignment • 683 views
ADD COMMENT
0
Entering edit mode

have you tried directly blasting the protein sequence against the Betacoronavirus database in NCBI blast, so a tblastn if you directly want to search with the amino acid sequence?

ADD REPLY

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6