Hi,
I am trying to construct a gene model of new species using augustus3.1.
I have some RNAseq data, so I utilized it for 'intron hints'.
Augustus tutorial says FilterBam program can be used for more accurate gene prediction (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=IncorporatingRNAseq.Tophat), so I decided to use it.
However, everytime I run filterBam, it always ended up with message like this:
processed line 74100terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc"
Despite of this message filtered BAM file was generated but I do not know this BAM file is reliable or not.
Where I went wrong?
Hi, Mehmet,
I run Augustus on my Mac PC(3.5Ghz quad core Intel core i5, 16G 1,600Mhz DDR3 SDRAM).
OK. As far as I understand from your e-mail, you want to predict genes for your new species and you want to use RNA-seq hints (Intron, Exon and Intron+Exon hints, separately).For this,
Produce intron hints:
You don't need to filter bam file. instead of this, you should you do:
Produce exon hints:
Exon+intron hints:
if you need more help, please ask.
Mehmet,
Thank you for answering my question!
I try steps you suggested, but I have a quick question.
augustus manual says it is better to run augustus (Bowtie/Tophat mapping) with untrimmed reads, and when I created intron-hints through Bowtie/Tophat-augustus pipeline, I utilized untrimmed RNAseq reads and obtained "accepted_hits.bam" file.
Is this file usable for getting cufflink transcript.gtf file?
Or should I do quality trimming?
Hi,
Yes it is useful. You should use
accepted_hits.bam
file to create intron and exon parts. It is okay. I always useaccepted_hits.bam
file to get hints.Thank you so much for your kind advice.
I am now merging transcript.gtf from different samples by cuffmerge.
Where can I get
cufflinkGTF2augustusExonParthints.pl
?Is the
gtf2gff.pl
in/scripts
dir not compatible with cufflink output*.gtf
files?The script that I sent you was written by my friend. You just need to convert gtf to gff. You can use it with
--printExon
option.Hi, Can I have one more question?
(I have been trying up with masking my species' genome with RepeatMasker and RepeatModeler, and it took me for almost 2 weeks!)
Anyway, I just merged hints(intronhint, exonhint, repeat-sequencehint) and ran augustus.
Then got this message:
I think this message means my exon hints were not used for gene prediction.
If you don't mind, please help me for solving this problem.
Hi,
Why did you use masked sequence? You don't need to use it.
Can you write your Augustus command which was used for gene prediction with RNA-seq hints.
This is the command that I used:
For repeat masker and repeat modeller:
Could you run these two programs? if no, I can show you how to use.
Thank you,
This is the command I used:
To be exact, I ran augustus on the unmasked genome("genome.fa") but supplied repeat information as nonexonpart hints.
I generated repeat-sequece hint from output file of RepeatMasker (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.IncorporateRepeats) and merged it with other hint files (exon hints, intron hints) when I ran augustus.
You can use masked genome file as genome file, but you don't need to use repeated sequence as hints file. Instead of this, you can remove repeats ,which were identified by repeat masker, using HaploMerger tool. HaploMerger removes repeats from your genome.
1.RepeatModeler finds repeats
with new genome file, without repeats, you can run a new gene prediction.
Hi Mehmet,
I'm looking for this transcript as well, as I'm following the steps you post on your blog. Was wondering if you can share the script with me as well?
Thanks in advance, Stefany