Hi,
I am unable to run Tophat2 as I get an error. Here is the command I run:
tophat2 -p 5 -r 62 –library-type fr-firststrand -G /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf –o /home/jmotwani/RNASeq/Alignment_Tophat2 --BOWTIE2_INDEXES /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/ C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq C95VLANXX-2046D-01-01-01_L003_R2_Trimmed.fastq
I get the following error after I run the above command:
[2016-05-22 22:20:05] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-05-22 22:20:05] Checking for Bowtie
Bowtie version: 2.2.9.0
[2016-05-22 22:20:05] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (–library-type.*.bt2l)
The indexed genome was downloaded from Illumina. Do I have to build it after downloading it? I downloaded the genome, gtf file, and indexed files and gave the path of those files in the command above.
Could anyone please comment or advise on this.
Thanks for your time.
Regards,
J
Thanks Goutham and WouterDeCoster. I tried but still get the same error.
Not sure if the files provided by iGenomes for Bowtie2Index are incompatible with latest version of tophat.
The files I have in the Bowtie2Index are:
genome.1.bt2 genome.2.bt2 genome.3.bt2 genome.4.bt2
genome.fa genome.rev.1.bt2
genome.rev.2.bt2 tophat_out
I guess the problem is typo.
-library-type
instead of--library-type
. The error says–library-type.*.bt2l
Thanks but still getting the same error. I realized that I missed '-' for library-type. I had written -library-type in place of --library-type. I changed it and ran the command again but I get a new error:
tophat: option -? not recognized for detailed help see http://ccb.jhu.edu/software/tophat/manual.shtml
Could you share the command that you ran? That looks like a different problem than before. Where are you indexes? Did you point the environmental variable to it's location?
Note, from the manual:
Here is the command:
tophat2 -p 5 -r 62 --library-type fr-firststrand -G /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf –o /home/jmotwani/RNASeq/Alignment_Tophat2 /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome read1.fastq read2.fastq
The indexes are in the location specified in the command. I have not created an environmental variable but have given the whole path instead in the command.
Have you read what I posted from the manual?
So this means that you have to specify only the basename of the indexes, that tophat will search in
I'm not sure if it's a copy paste issue, but I can see in your post here that you have different kinds of '-': "- and –" This can happen when copy pasting commands from e.g Microsoft Word
I think I have managed to get rid of the previous error but now it throws new error!!!
tophat2 --num-threads 5 --mate-inner-dist 62 --library-type fr-firststrand --GTF /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf --output-dir /home/jmotwani/RNASeq/Alignment_Tophat2 /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome read1.fastq read2.fastq
Output:
[2016-05-23 22:07:53] Beginning TopHat run (v2.1.1)
[2016-05-23 22:07:53] Checking for Bowtie Bowtie version: 2.2.9.0 [2016-05-23 22:07:53] Checking for Bowtie index files (genome).. [2016-05-23 22:07:53] Checking for reference FASTA file [2016-05-23 22:07:53] Generating SAM header for /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome [2016-05-23 22:07:55] Reading known junctions from GTF file [2016-05-23 22:08:28] Preparing reads [FAILED] Error running 'prep_reads' Error: qual length (111) differs from seq length (106) for fastq record !
Has anyone come across this problem?
This tells you that your fastq file is corrupt since the quality string is longer than the sequence. What did you use for trimming the reads? How have you modified/processed your fastq data?
I used a in-house script (cleanadaptors) to trim the raw fastq files. I run the command to trim the data in the following way:
cleanadaptors -I /home/jmotwani/RNASeq/contam.fa -q 20 -x 25 -F C95VLANXX-2046D-01-01-01_L003_R1.fastq -o C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq -G C95VLANXX-2046D-01-01-01_L003_R2.fastq -O C95VLANXX-2046D-01-01-01_L003_R2_trimmed.fastq
-q is for quality and -x is for min length of the read
I can't help you with an in-house script, obviously.
Please use the comment options correctly to enable threading of questions and replies.