Question

syntax tophat2 for reference index

1

Entering edit mode

7.8 years ago

marongiu.luigi ▴ 750

Hello,

I want to use Tophat to align reading for RNA-seq analysis. I downloaded the sequences SRR390728_1.fastq and SRR390728_2.fastq, quality trimmed with

java -jar /usr/bin/trimmomatic PE SRR390728_1.fastq SRR390728_2.fastq paired1.fq unpaired_1.fq paired2.fq unpaired_2.fq SLIDINGWINDOW:4:20 MINLEN:20 ILLUMINACLIP:/usr/local/lib/Trimmomatic/adapters/TruSeq2-PE.fa:2:30:10:1:true

and then downloaded the reference human sequences Homo_sapiens.GRCh38.dna.toplevel.fa and Homo_sapiens.GRCh38.90.gtf, which I renamed Hsapiens_GRCh38.fa and Hsapiens_GRCh38.gtf respectively. I then indexed them with

bowtie2-build -f  Hsapiens_GRCh38.fa Hsapiens_GRCh38
tophat2 -G Hsapiens_GRCh38.gtf --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38

and then copied everything in the same folder, so that the content of the folder in use is:

$ ls
align.sam               Hsapiens_GRCh38.rev.1.bt2l  SRR390728_2.fastq
Hsapiens_GRCh38.1.bt2l  Hsapiens_GRCh38.rev.2.bt2l 
Hsapiens_GRCh38.2.bt2l  Hsapiens_GRCh38.tr          
Hsapiens_GRCh38.3.bt2l  paired1.fq                  unpaired_1.fq
Hsapiens_GRCh38.4.bt2l  paired2.fq                  unpaired_2.fq
Hsapiens_GRCh38.fa      Hsapiens_GRCh38.gtf     SRR390728_1.fastq

with Hsapiens_GRCh38.tr being a folder.

Then I ran the alignment with:

$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 11:26:19] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 11:26:19] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 11:26:19] Checking for Bowtie index files (transcriptome)..
Error: Could not find Bowtie 2 index files Hsapiens_GRCh38.tr.*.bt2l)

Then I gave the index as a path adding './' and I got:

$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq
   [2017-10-12 11:27:36] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 11:27:36] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 11:27:37] Checking for Bowtie index files (genome)..
[2017-10-12 11:27:37] Checking for reference FASTA file
[2017-10-12 11:27:37] Generating SAM header for Hsapiens_GRCh38
Error: Opening file ./Hsapiens_GRCh38.tr.gff

The content of Hsapiens_GRCh38.tr is:

./Hsapiens_GRCh38.tr$ ls
Hsapiens_GRCh38.1.bt2  Hsapiens_GRCh38.fa         Hsapiens_GRCh38.rev.2.bt2
Hsapiens_GRCh38.2.bt2  Hsapiens_GRCh38.fa.tlst    Hsapiens_GRCh38.ver
Hsapiens_GRCh38.3.bt2  Hsapiens_GRCh38.gff
Hsapiens_GRCh38.4.bt2  Hsapiens_GRCh38.rev.1.bt2

My questions are:

Have I made an error in the syntax? (how Hsapiens_GRCh38.tr.gff came out?)
it is possible to give a path to the .tr folder and the other indices, so I can use a single folder for all the alignments?

Thank you

RNA-Seq • 3.8k views

ADD COMMENT • link updated 7.8 years ago by e.rempel ★ 1.1k • written 7.8 years ago by marongiu.luigi ▴ 750

1

Entering edit mode

I want to use Tophat to align reading for RNA-seq analysis.

Only reason to continue using TopHat at this time is nostalgia.

There are much better/accurate tools that you should switch to. Even authors of TopHat have suggested using HISAT2 (their new tool). STAR, BBMap (any other splice aware aligner) are excellent other choices.

ADD REPLY • link 7.8 years ago by GenoMax 152k

0

Entering edit mode

I know, it is nostalgia indeed. I am already switching to STAR and HISAT, but I wanted to get to know Tophat for completion. The index was built a folder above Hsapiens_GRCh38.tr (actually the latter was created by Tophat).

ADD REPLY • link 7.8 years ago by marongiu.luigi ▴ 750

0

Entering edit mode

The transcriptome index was built inside Hsapiens_GRCh38.tr folder? Are the missing files in there?

ADD REPLY • link 7.8 years ago by GenoMax 152k

0

Entering edit mode

The indexing looked done OK, there were no error messages. I provided the list of files in use; does not look to me there is something missing.

ADD REPLY • link 7.8 years ago by marongiu.luigi ▴ 750

score 1 · Answer 1 · 2017-10-12

1

Entering edit mode

7.8 years ago

e.rempel ★ 1.1k

Ok. Let's start from the beginning. First, I would update Bowtie2 to the current version (2.3.3.1). You must check .gtf and .fasta files for compatibility (same chr names etc).

Bowtie index (hg38... files will be created)

bowtie2-build -f Hsapiens_GRCh38.fa hg38

The command for transcriptome index generation must be (creates Hsapiens_GRCh38...)

tophat2 -G Hsapiens_GRCh38.gtf --transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38 hg38

TopHat2 run

tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38 hg38 paired1.fq paired2.fq

ADD COMMENT • link 7.8 years ago by e.rempel ★ 1.1k

0

Entering edit mode

Thank you. I have upgraded Bowtie to 2.3.3.1 using the precompiled version for linux 64 bit; the version 2.2.9.0 I was using instead was generated from the source with make. However, while with the former version it took me overnight to generate the index, with the newer I had to stop it after 48 h because the process was still running. Is that normal? this slow mode is perhaps due to the precompiled version?

ADD REPLY • link 7.7 years ago by marongiu.luigi ▴ 750

0

Entering edit mode

It worked. I built bowtie2 2.3.3.1 from source and this time everything went smoothly. I could even use a fixed folder where I ma keeping the reference indices (/home/RefSeq/Human):

tophat2 -o SRR390728_aln --transcriptome-index=/home/RefSeq/Human/Hsapiens_GRCh38.tr/Hsapiens_GRCh38 /home/RefSeq/Human/hg38 paired1.fq paired2.fq

ADD REPLY • link 7.7 years ago by marongiu.luigi ▴ 750

score 0 · Answer 2 · 2017-10-12

0

Entering edit mode

7.8 years ago

e.rempel ★ 1.1k

You should include the prefix Hsapiens_GRCh38 in the path to the index:

--transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38

ADD COMMENT • link 7.8 years ago by e.rempel ★ 1.1k

0

Entering edit mode

I got the following:

$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:05:10] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:05:10] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:05:10] Checking for Bowtie index files (transcriptome)..
[2017-10-12 16:05:10] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (paired1.fq.*.bt2l)

$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr/Hsapiens_GRCh38 paired1.fq paired2.fq


[2017-10-12 16:05:32] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:05:32] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:05:32] Checking for Bowtie index files (transcriptome)..
[2017-10-12 16:05:32] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (paired1.fq.*.bt2l)

I reckon the base name 'Hsapiens_GRCh38' should really be on its own.

ADD REPLY • link 7.8 years ago by marongiu.luigi ▴ 750

0

Entering edit mode

I even tried to remove the .tr from the index name:

$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38 Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:12:02] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:12:02] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:12:02] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (Hsapiens_GRCh38.*.bt2l)

Yet the *.bt2l files are in the current folder. Even by moving them into Hsapiens_GRCh38.tr, there is an error:

$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:16:32] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:16:32] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:16:32] Checking for Bowtie index files (transcriptome)..
Error: Could not find Bowtie 2 index files Hsapiens_GRCh38.tr.*.bt2l)
$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:16:51] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:16:51] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:16:51] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (Hsapiens_GRCh38.*.bt2l)

ADD REPLY • link 7.8 years ago by marongiu.luigi ▴ 750