How can I transfer gene models to a new assembly?
4
0
Entering edit mode
4.7 years ago
O.rka ▴ 740

Here's my data:

sample_A: Canonical assembly with gene models (sample_A.fasta, sample_A.gff3)

sample_B: Mutant and de-novo assembly. No gene models (sample_B.fasta)

I want to transfer the gene models from sample_A to sample_B.

I thought this would be straightforward but it's definitely not. There are some instances where exon_2 comes before exon_1 or where a particular exon maps multiple times on the de-novo assembly.

Is there a tool that will do this? Ideally, I would like a tool that does the following:

program --ref_assembly sample_A.fasta --ref_annotations sample_A.gff3 --query_assembly sample_B.fasta --percent_identity 0.98 > sample_B.gff3

Here is an example of a unique edge case when I've mapped the exons from transcript FUN_000463-T1(from sample_A.gff3 and sample_A.fasta) to the new assembly (sample_B.fasta). Notice the exon ordering: enter image description here

Here's the left side zoomed in:

Here's the right side zoomed in:

Notice the exon ordering.

gene Assembly • 2.2k views
ADD COMMENT
1
Entering edit mode

You can try RATT. Success will depend on quality of your assemblies.

ADD REPLY
0
Entering edit mode

Thank you. I'm looking at it right now and it's pretty confusing to run. https://vcru.wisc.edu/simonlab/bioinformatics/programs/ratt/Documentation.html I installed with conda but it appears a lot of the files aren't there. I also found this tutorial: http://avrilomics.blogspot.com/2013/02/using-ratt-to-transfer-gene-predictions.html

Do you know of any other tools for this? I've heard of liftover but there is little documentation on using with a new organism.

ADD REPLY
0
Entering edit mode

I've updated my question a bit to be more specific.

ADD REPLY
3
Entering edit mode
4.7 years ago
Juke34 8.9k

There is a list of tool in table5 of this publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450745/

If you need the transcripts you just extract them from your GFF e.g with AGAT:
agat_sp_extract_sequences.pl -g infile.gff -f infile.fasta --cdna

Not listed in the publication you can also use MAKER. See basic protocol4 MAPPING ANNOTATIONS TO A NEW ASSEMBLY Genome in Genome Annotation and Curation Using MAKER and MAKER-P

ADD COMMENT
0
Entering edit mode

Thank you for the suggestions! I will continue to look through these. It looks like "CESAR" is the most modern out of all of the tools (2016). I've had issues running older tools that haven't been maintained in a while. I'm looking at "transMap" right now but it's a bit confusing. So is transMap a part of https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit ? I haven't seen any tutorials describing how to do this exactly. I'm a bit new to these suites as I'm more familiar with funannotate.

ADD REPLY
2
Entering edit mode
16 months ago
vkkodali_ncbi ★ 3.8k

I suggest giving Liftoff a try. I have had good experience with it in copying annotation from one genome to another. The companion tool LiftoffTools is useful in assessing the output too.

ADD COMMENT
1
Entering edit mode
16 months ago

Iv'e used vkkodali_ncbi's mentioned LiftOff with great success, another alternative is TOGA: https://github.com/hillerlab/TOGA

I guess TOGA is better when you have several assemblies, the paper is interesting https://www.science.org/stoken/author-tokens/ST-1161/full

ADD COMMENT
0
Entering edit mode
16 months ago
O.rka ▴ 740

What I've been doing since I posted this is just using the original gene models for a database and then run MetaEuk for the gene calls on the new genome using that database. Quick and dirty. Probably not the most robust method but it gets a decent answer.

ADD COMMENT
0
Entering edit mode

Please provide a link for MetaEuk. There can be packages with similar names and including a link here will make your answer more useful.

ADD REPLY

Login before adding your answer.

Traffic: 2741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6