I already used Tophat some times, but now I changed my script so it could handle xsq format. The data looks exactly the same, and the command too but now Tophat fails! Does anyone know what the error "Too few quality values for read: 28 I are you sure this is a FASTQ-int file?" means and how I can solve this problem?
My input files contain reads like: testset_0_F3.csfasta
>1_104_494_F3
T303333303213331130333123.32....3..................
testset_0_F3.qual
>1_104_494_F3
10 10 8 15 18 4 4 5 4 4 4 4 19 10 4 5 8 17 4 6 14 5 4 5 -1 4 4 -1 -1 -1 -1 14 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
testset_0_F5.csfasta
>1_104_494_F5
G3333333223303333031330.3323........
testset_0_F5.qual
>1_104_494_F5
7 6 12 12 17 13 4 5 5 13 7 10 5 7 12 9 5 4 4 5 4 4 -1 7 6 4 4 -1 -1 -1 -1 -1 -1 -1 -1
My tophat command is:
tophat2 --color -p 8 --bowtie1 -a 5 -m 2 -N 10 --library-type fr-secondstrand -x 1 --read-edit-dist 12 -o testset_0 GRCh37_gatk_colorspace testset_0_F3.csfasta testset_0_F3.qual testset_0_F5.csfasta testset_0_F5.qual
My tophat output is:
[2013-03-26 14:40:47] Beginning TopHat run (v2.0.6)
-----------------------------------------------
[2013-03-26 14:40:47] Checking for Bowtie
Bowtie version: 0.12.8.0
[2013-03-26 14:40:47] Checking for Samtools
Samtools version: 0.1.18.0
[2013-03-26 14:40:47] Checking for Bowtie index files
[2013-03-26 14:40:47] Checking for reference FASTA file
[2013-03-26 14:40:47] Generating SAM header for /data/GENOMES/human_GRCh37_gatk/GRCh37_gatk_colorspace
format: fasta
[2013-03-26 14:41:05] Preparing reads
left reads: min. length=50, max. length=50, 63858 kept reads (36142 discarded)
right reads: min. length=99, max. length=149, 67525 kept reads (32475 discarded)
[2013-03-26 14:41:07] Mapping left_kept_reads to genome GRCh37_gatk_colorspace with Bowtie
[2013-03-26 14:41:33] Mapping left_kept_reads_seg1 to genome GRCh37_gatk_colorspace with Bowtie (1/2)
[2013-03-26 14:42:00] Mapping left_kept_reads_seg2 to genome GRCh37_gatk_colorspace with Bowtie (2/2)
[2013-03-26 14:42:25] Mapping right_kept_reads to genome GRCh37_gatk_colorspace with Bowtie
[FAILED]
Error running bowtie:
Too few quality values for read: 28 I
are you sure this is a FASTQ-int file?
terminate called after throwing an instance of 'int'
maybe you are missing this option: "-Q/--quals Separate quality value files - colorspace read files (CSFASTA) come with separate qual files"? And also the order of you parameters seems of to me when reading the tophat manual: "tophat --color --quals [other options]* <colorspace_index_base> <reads1_1[,...,readsn_1]> [reads1_2,...readsn_2]="" <quals1_1[,...,qualsn_1]>="" [quals1_2,...qualsn_2]="" "<="" p="">
Thanks, that was the problem! I knew it was something simple like that, but could not find it by myself!
Why delete the post, I've made it open, so that other people can benefit!!
Please add this as an answer!!