Question

TopHat Allignment - Do I need to run each sequence file separately

0

Entering edit mode

8.9 years ago

john.ncsu.tox • 0

First, I am brand new to this forum and brand new to RNAseq; I searched the forums for this, but didn't find another question similar enough to answer it.

I have 2 control files and 2 treatment files (RNA sequencing). The files are old enough that they unstranded files and they are not paired end files (hence each 4 are distinct).

I trimmed the files with trimmomatic, and was going to perform alignment with TopHat2 next. Our cluster has all the software installed for Bowtie2, samtools etc...

I downloaded and unzipped UCSC hg18 bowtie indexes here: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml

So, the questions.

Do I need to run all the files through one at a time?
If I run them through TH2 separately, do I have to specify 4 different output folders for each submission?
I ran into the thread issue with TH2 last night. I specified p=8 threads, and the submission crapped out 1 hour in
```
Searching for junctions via segment mapping
[FAILED]   Error: segment-based junction search failed with err =1
Error: could not get read# 9850246 from stream!)
```
I then specified p=1 and it ran, but took 6 hours....if someone knows a good sbatch parameter list to prevent this, I would greatly appreciate it.
Lastly, I got one warning
```
Checking for reference FASTA file
Warning: Could not find FASTA file /locationofbowtieindexes/hg18.fa)
```
Do I need to put a genome.fa file there from here? (http://support.illumina.com/sequencing/sequencing_software/igenome.html hg38 link under Homo Sapien)?

This is my current submission script:

    #!/bin/sh
    SAMPLE_ID=trim_Mitchell_P2D-F2.fastq
    GENE_REF=filepathtobowtie2index
    P=1 #USE 8 THREADS

    tophat2 -o tophat_out -p $P $GENE_REF pathtoRNAsequencefastafile/$SAMPLE_ID

Thanks so much for your help and sorry for my noobness!

tophat bowtie RNAseq • 2.4k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.9 years ago by john.ncsu.tox • 0

Ram · Answer 1 · 2016-02-15

1

Entering edit mode

8.9 years ago

Benn 8.4k

If you don't want to run TopHat for all files separately, you can try to make a loop.
Output folder for each run would be handy yes, since the names in the folder will all be the same.
Don't know if this is an issue with your # of threads.
hg18 is not hg38!

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.9 years ago by Benn 8.4k

0

Entering edit mode

Thank you b.nota. Question number 3: I had threads = 8 and got the error. I looked for others who had this issue, and they suggested the single thread to solve it. It then ran without any problems, but took ~ 6 hours.

Thanks for reminding me that hg18 is not hg38. Can you shine any light on the warning that I got, and what TH2 is looking for? I assumed it was a full reference (non-bowtie2 indexed) genome, but I wasn't sure. If that is the case, does that need to be in the same directories as my indexes?

Lastly, it sounds like you are saying that this does need to be 4 runs separately, but that I could loop them in sequence with a script. I wanted to make sure that TH2 couldn't align all 4 at once somehow more efficiently before I submitted 4 separate jobs.

Thanks again.

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 8.9 years ago by john.ncsu.tox • 0

0

Entering edit mode

I am not an expert in TopHat, just another user, but I always use the maximum of threads available on my machine. Never get errors like yours...

There should be a genome.fa or hg38.fa file in your index folder, or a link to it. If you downloaded it from igenome website it should be in there.

I always use a loop when I want to map my fq files all at once (in one go that is). You can try to map them in parallel, I never tried that.

ADD REPLY • link 8.9 years ago by Benn 8.4k

0

Entering edit mode

You could also run bowtie2-inspect on your index and safe the output as $index.fa:

bowtie2-inspect myindex > myindex.fa

This way, you get exactly the same naming of chromosomes as in your index, which is not guaranteed if you download a fasta file.

Additionally, Tophat2 results better alignment rates, if you provide the transcriptome index (see the corresponding section here).

ADD REPLY • link 8.9 years ago by michael.ante ★ 3.9k