Question

Get BUSCO gene descriptions

2

Entering edit mode

9.2 years ago

pbigbig ▴ 250

Hi everyone,

I am planning to design primers (to run Sanger sequencing) for assessment of a genome de novo assembly. These primers can be chosen arbitrary, but I prefer to have some meaning of sequenced results, therefore I run BUSCO eukaryote (~400 single-copy orthologs) on the de novo assembly genome. BUSCO run revealed ~60% Complete Single-Copy BUSCOs, but I wonder how could I get to know the name and description of those orthologs in eukaryote set (there are only alignments and numbered code for matches in results)? I really appreciated any help.

Thank you very much in advance!

BUSCO de novo assembly • 7.5k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by pbigbig ▴ 250

0

Entering edit mode

Also very interested in this, have you found an answer?

ADD REPLY • link 9.0 years ago by twooldridge • 0

0

Entering edit mode

Sadly not yet, but I could still obtain those ortholog's fasta sequences in BUSCO results and Blast them against Refseq database to get best hit accession ID, then simply refer list of these IDs for descriptive titles (I used Batch Entrez http://www.ncbi.nlm.nih.gov/sites/batchentrez)

ADD REPLY • link 9.0 years ago by pbigbig ▴ 250

score 3 · Answer 1 · 2019-08-07

3

Entering edit mode

5.8 years ago

thackl ★ 3.0k

Just came across the same issue, and came up with a solution. Most BUSCO data sets are generated from OrthoDB. You can query OrthoDB via its API to map BUSCO IDs and pull the information. I've posted a short R snippet to automate this and produce a nice table https://thackl.github.io/BUSCO-gene-descriptions

ADD COMMENT • link 5.8 years ago by thackl ★ 3.0k

1

Entering edit mode

Oh great! Thank you very much! Although the post was long time ago but I think it still very useful for other de novo genome project.

ADD REPLY • link 5.8 years ago by pbigbig ▴ 250

1

Entering edit mode

Yeah, I was hoping you had moved on by now ;)

ADD REPLY • link 5.8 years ago by thackl ★ 3.0k

score 0 · Answer 2 · 2018-11-16

0

Entering edit mode

6.6 years ago

william.imart • 0

If you load the FASTA sequence into IGV you can look at the entire genome alongside the genes they code for. From there you can search the name and function of each of these genes.

ADD COMMENT • link 6.6 years ago by william.imart • 0

Ram · Answer 3 · 2022-08-15

For others. You can find function information about compete BUSCO's in the output directory generated by BUSCO, file full_table.tsv. To find the function of any BUSCO, navigate to the lineage directory you're using and grep the BUSCO id like the example below...

grep -A 1 <BUSCO id> ancestral

BLAST the resulting sequence. Maybe there's an easier way to find missing BUSCO functions, but this is the only I'm aware of.