Hello,
I want to use Tophat to align reading for RNA-seq analysis. I downloaded the sequences SRR390728_1.fastq and SRR390728_2.fastq, quality trimmed with
java -jar /usr/bin/trimmomatic PE SRR390728_1.fastq SRR390728_2.fastq paired1.fq unpaired_1.fq paired2.fq unpaired_2.fq SLIDINGWINDOW:4:20 MINLEN:20 ILLUMINACLIP:/usr/local/lib/Trimmomatic/adapters/TruSeq2-PE.fa:2:30:10:1:true
and then downloaded the reference human sequences Homo_sapiens.GRCh38.dna.toplevel.fa and Homo_sapiens.GRCh38.90.gtf, which I renamed Hsapiens_GRCh38.fa and Hsapiens_GRCh38.gtf respectively. I then indexed them with
bowtie2-build -f Hsapiens_GRCh38.fa Hsapiens_GRCh38
tophat2 -G Hsapiens_GRCh38.gtf --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38
and then copied everything in the same folder, so that the content of the folder in use is:
$ ls
align.sam Hsapiens_GRCh38.rev.1.bt2l SRR390728_2.fastq
Hsapiens_GRCh38.1.bt2l Hsapiens_GRCh38.rev.2.bt2l
Hsapiens_GRCh38.2.bt2l Hsapiens_GRCh38.tr
Hsapiens_GRCh38.3.bt2l paired1.fq unpaired_1.fq
Hsapiens_GRCh38.4.bt2l paired2.fq unpaired_2.fq
Hsapiens_GRCh38.fa Hsapiens_GRCh38.gtf SRR390728_1.fastq
with Hsapiens_GRCh38.tr being a folder.
Then I ran the alignment with:
$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq
[2017-10-12 11:26:19] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 11:26:19] Checking for Bowtie
Bowtie version: 2.2.9.0
[2017-10-12 11:26:19] Checking for Bowtie index files (transcriptome)..
Error: Could not find Bowtie 2 index files Hsapiens_GRCh38.tr.*.bt2l)
Then I gave the index as a path adding './' and I got:
$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq
[2017-10-12 11:27:36] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 11:27:36] Checking for Bowtie
Bowtie version: 2.2.9.0
[2017-10-12 11:27:37] Checking for Bowtie index files (genome)..
[2017-10-12 11:27:37] Checking for reference FASTA file
[2017-10-12 11:27:37] Generating SAM header for Hsapiens_GRCh38
Error: Opening file ./Hsapiens_GRCh38.tr.gff
The content of Hsapiens_GRCh38.tr is:
./Hsapiens_GRCh38.tr$ ls
Hsapiens_GRCh38.1.bt2 Hsapiens_GRCh38.fa Hsapiens_GRCh38.rev.2.bt2
Hsapiens_GRCh38.2.bt2 Hsapiens_GRCh38.fa.tlst Hsapiens_GRCh38.ver
Hsapiens_GRCh38.3.bt2 Hsapiens_GRCh38.gff
Hsapiens_GRCh38.4.bt2 Hsapiens_GRCh38.rev.1.bt2
My questions are:
Have I made an error in the syntax? (how Hsapiens_GRCh38.tr.gff came out?)
it is possible to give a path to the .tr folder and the other indices, so I can use a single folder for all the alignments?
Thank you
Only reason to continue using TopHat at this time is nostalgia.
There are much better/accurate tools that you should switch to. Even authors of TopHat have suggested using HISAT2 (their new tool). STAR, BBMap (any other splice aware aligner) are excellent other choices.
I know, it is nostalgia indeed. I am already switching to STAR and HISAT, but I wanted to get to know Tophat for completion. The index was built a folder above Hsapiens_GRCh38.tr (actually the latter was created by Tophat).
The transcriptome index was built inside
Hsapiens_GRCh38.tr
folder? Are the missing files in there?The indexing looked done OK, there were no error messages. I provided the list of files in use; does not look to me there is something missing.