Hi!
The goal of my subproject is to find transcripts upregulated upon treatment in the larvae of Spodoptera littoralis. On the current step I have an assembly, and the tblastx results (assembly against the huge database from NCBI). In order to continue with annotation I transformed blast results into gtf file (wrote a python script that also does cutoff and then write results into gtf format). The gtf file looks like this:
Slitt_C1 tblastx exon 4225 5697 667 + . gene_id "gi|827554818|ref|XM_004929801.2|"; transcript_id "gi|827554818|ref|XM_004929801.2|";
After that aligned the reads to the assembly by bowtie2, and tried to run tophat2.
~/bin/tophat2 -G bowtie_result/Slitt.gtf -o tophat_with_annotation/ -p 16 bowtie_result/Slitt ../02_trim/A_R1_P.fq ../02_trim/A_R2_P.fq
And I got the error:
[2016-07-28 11:05:14] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-07-28 11:05:14] Checking for Bowtie
Bowtie version: 2.2.8.0
[2016-07-28 11:05:15] Checking for Bowtie index files (genome)..
[2016-07-28 11:05:15] Checking for reference FASTA file
[2016-07-28 11:05:15] Generating SAM header for bowtie_result/Slitt
[2016-07-28 11:05:15] Reading known junctions from GTF file
[2016-07-28 11:05:20] Preparing reads
left reads: min. length=59, max. length=99, 4805446 kept reads (1341 discarded)
right reads: min. length=59, max. length=99, 4795101 kept reads (11686 discarded)
[2016-07-28 11:08:38] Building transcriptome data files transcriptome_index/Slitt
[2016-07-28 11:08:42] Building Bowtie index from Slitt.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1
After looking into the log file, the last thing tophat was trying to do was to run bowtie2-build on the Slitt.fa from the temp folder.
I already checked the names in the assembly and annotation, it is all the same, so mistake is coming from something else. I would appreciate any tips how to get blast results into the expression level file (of course I can run a script for assigning the regions of scaffolds to the specific annotations, but it will take a lot of time).
Thank you!
Even though it won't fix this problem I am thinking to use STAR.
Would be a good idea, although it might satisfy you to solve the problem ;-)
Is the
Slitt.fa
file in the same directory as your bowtie index (bowtie_result)? If not try putting a copy in there.yep, I guess I tried all the standard solutions there are in the internet.
You may want align against the genome (rather than the transcriptome) so as to avoid forcing the aligner to mis-align reads, especially in this case where you don't have a well defined transcriptome.
In the perfect world I would. However, there is no genome for this or even close related species.
TopHat is deprecated to some extent so perhaps using STAR or HISAT2 (if you want to stay with the same family) may be better option as you have already indicated.
I would recommend that you try BBMap. If you do use it then remember to add flag
sam=1.3
since the default SAM flags are v. 1.4 which are not understood by featureCounts/HTSeq-count.