Question

aligning scaffolds to the reference genome

0

Entering edit mode

7.9 years ago

arta ▴ 670

Hi all,

I had chloroplast reads and assembled them using Abyss and got chloro-scaffolding.fa files which consists of many scaffolds. I have reference chloroplast genome as well. I would like to align scaffoldings into reference genome and annotate the genes. Do you recommend me tools to do that or workflow how to do? Currently i am working with exonerate however it seems i can not annotate genes. Also the output of exonerate is not clear to me, i do not know how to do them for the downstream analysis.

Assembly scaffoldings alignment annotation • 2.5k views

ADD COMMENT • link updated 7.9 years ago by apa@stowers ▴ 610 • written 7.9 years ago by arta ▴ 670

score 1 · Answer 1 · 2017-01-10

It sounds like you are assembling mRNA-seq reads into transcripts and trying to align these to produce gene models?

If so, I typically would run exonerate like: "exonerate -m est2genome --revcomp --bestn 1 --showcigar --showtargetgff -t chloroplast.fa -q scaffolds.fa > scaffolds.out 2> scaffolds.err", then extract the GFF lines from scaffolds.out. However, this will not generate CDS features in the GFF.

If CDS annotations are important, then first run ORF prediction on your sequences and produce a 4-column, space-delim file ("CDS.txt"), one row per scaffold, containing these 4 values: scaffold ID, strand of ORF (+ or -), scaffold ORF start (1-based), scaffold ORF end. With this file, you can run exonerate like this: "exonerate -m cdna2genome --annotation CDS.txt --revcomp --bestn 1 --showcigar --showtargetgff -t chloroplast.fa -q scaffolds.fa > scaffolds.out 2> scaffolds.err". The GFF will now include CDS lines.

Depending on the results, you may want to change default values for --refine, --minintron, --maxintron. Exonerate is a parameter jungle so you may find other useful ones, but these are what I typically use.