Hello!
I'm just starting to use tophat and I have a little problem which I am not able to solve. I wanna align several human transcriptomes, so I have downloaded the reference human genome (ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/NCBI/GRCh38/Homo_sapiens_NCBI_GRCh38.tar.gz) and now I wanna use tuxedo protocol.
Executing the following commands:
1st. Uncompress the genome:
tar xvfz Homo_sapiens_NCBI_GRCh38.tar.gz
2nd. Make a working directory:
mkdir Alignments
3rd. Create symbolic links to annotation files and bowtie index (inside the working directory):
ln -s /path_to/Hsa38/Annotation/Archives/archive-2015-08-11-09-31-31/Genes/genes.gtf
ln -s /path_to/Hsa38/Sequence/Bowtie2Index/genome.*.
4th. Try to run tophat (inside the working directory):
tophat -p 8 -G genes.gtf -o sample_output --library-type=fr-firststrand genome sample.fq
The output message was the following one:
[2017-11-07 00:29:47] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-11-07 00:29:47] Checking for Bowtie
Bowtie version: 2.2.9.0
[2017-11-07 00:29:47] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (genome.*.bt2l)
After that I have tried to finde some file with the extension .bt2l
: find path_to/Hsa38/ -iname *bt2l
but I had not success. Does anyone know where is the index? or how can i solve this trouble?
Thanks in advance.
You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.
Thaks so much! I've just got the publication =D
Just look in the actual folder and see what the files are named so you can adjust the symlinks as necessary. Should
genome.*.
, begenome.*
? It's not going to recognize the wildcard as is.I just have tried it but it does not work:
Create the symbolic link to the directory, not the file prefix:
Then, execute
tophat
with:As my colleague Wouter has stated, also, tophat/tophat2 is 'retired' and HISAT/HISAT2 is the upgraded version.
Kevin