Strange BUSCO selected sequences
1
0
Entering edit mode
12 months ago
Pit • 0

Hi good people,

I was using BUSCO (v. 5.3.2 https://busco.ezlab.org/) to extract protein-coding sequences from some full genomes. The lineage file I was using was eutheria_odb10. It went through without error at first. As I was preforming sequence alignment, however, I discovered that some sequences were rejected by the alignment program. They were quite strange. Here's one for example:

ATACACCAAAATGAAGACTGCCACCAACATCTATATTTTCAACCTTGCTCTGGCAGATGCCCTAGCAACCAGTACCCTGCCCTTCCAGAGTGTCAATTACCTAATGGGAACATGGCCCTTTGGAACCATCCTCTGCAAGATTGTGATCTCCATAGATTACTATAATATGTTCACCAGCATATTCACCCTCTGCACCATGAGCATTGATCGCTACATCGCAGTCTGCCATCCCGTCAAGGCCCTGGATTTCCGCACTCCCCGCAATGCCAAGATCGTCAACATCTGCAACTGGATCCTCTCTTCAGCCATTGGTCTGCCTGTGATGTTCATGGCGACAACAAAGTACCGGCAAGGTTCCATAGATTGTACTCTAACATTTTCTCACCCAACCTGGTACTGGGAAAACCTGCTGAAGATCTGTGTTTTCATCTTTGCTTTCATCATGCCCGTCCTCGTCATTACGGTGTGTTACGGACTGATGATCTTACGCCTCAAGAGCGTCCGCGTGCTCTCTGGCTCCAAAGAAAAGGATCGGAACCTGCGAAGAATCACCAGGATGGTGCTGGTGGTTGTGGCTGTGTTCATTGTCTGCTGGACCCCCATTCACATTTACGTCATCGTCAAAGCCTTGATCACAATCCCAGAAACTACTTTCCAGACTGTTTCATGGCACTTCTGCATTGCTCTCGGTTACACAAACAGCTGCCTGAACCCAGTCCTTTATGCGTTTCTGGATGAAAACTTCAAACGATGCTTCAGAGAGTTCTGCATCCCAACGTCCTCCACCATTGAGCAGCAAAACTCCACTAGAATGCGTCAGAACACCAGAGACCTCCCCTCCACGGCCAACACAGTGGATAGGACTAACCATCAGAAATTCAGTGGAACAAATAACCTTTCAAATGGCTACACTGCAAGTAAATATCAACATCTAAATCCCAATAATGCGATTGGATTTATCAAGAAGATGAAAAATATTCACAGTTCTTAG

I was confused as to why BUSCO considered it a protein-coding sequence at all because it doesn't look like one. Therefore I don't know if I should try salvage those sequences or not. Anyone has any ideas or some possible ways to correct this behaviour?

Thanks.

BUSCO DNA • 592 views
ADD COMMENT
1
Entering edit mode
12 months ago
shelkmike ★ 1.4k

This sequence is an ORF, but not in the first frame. You can check it at https://web.expasy.org/translate/

My guess is that this sequence is at an edge of a contig. This is why BUSCO wasn't able to determine the start of this gene.

There are many possible solutions for this problem. For example, you can drop all sequences located at contig edges. Alternatively, you can align the sequences using TranslatorX (http://translatorx.co.uk/), allowing it to determine the frame automatically.

ADD COMMENT
0
Entering edit mode

I tried aligning with TranslatorX. Mafft method did produce an alignment with seemingly missing start codon. I'll talk to my boss to see what to do with them. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6