TopHat Allignment - Do I need to run each sequence file separately
1
0
Entering edit mode
8.9 years ago

First, I am brand new to this forum and brand new to RNAseq; I searched the forums for this, but didn't find another question similar enough to answer it.

I have 2 control files and 2 treatment files (RNA sequencing). The files are old enough that they unstranded files and they are not paired end files (hence each 4 are distinct).

I trimmed the files with trimmomatic, and was going to perform alignment with TopHat2 next. Our cluster has all the software installed for Bowtie2, samtools etc...

I downloaded and unzipped UCSC hg18 bowtie indexes here: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml

So, the questions.

  1. Do I need to run all the files through one at a time?
  2. If I run them through TH2 separately, do I have to specify 4 different output folders for each submission?
  3. I ran into the thread issue with TH2 last night. I specified p=8 threads, and the submission crapped out 1 hour in

    Searching for junctions via segment mapping
    [FAILED]   Error: segment-based junction search failed with err =1
    Error: could not get read# 9850246 from stream!)
    

    I then specified p=1 and it ran, but took 6 hours....if someone knows a good sbatch parameter list to prevent this, I would greatly appreciate it.

  4. Lastly, I got one warning

    Checking for reference FASTA file
    Warning: Could not find FASTA file /locationofbowtieindexes/hg18.fa)
    

    Do I need to put a genome.fa file there from here? (http://support.illumina.com/sequencing/sequencing_software/igenome.html hg38 link under Homo Sapien)?

This is my current submission script:

    #!/bin/sh
    SAMPLE_ID=trim_Mitchell_P2D-F2.fastq
    GENE_REF=filepathtobowtie2index
    P=1 #USE 8 THREADS

    tophat2 -o tophat_out -p $P $GENE_REF pathtoRNAsequencefastafile/$SAMPLE_ID

Thanks so much for your help and sorry for my noobness!

tophat bowtie RNAseq • 2.4k views
ADD COMMENT
1
Entering edit mode
8.9 years ago
Benn 8.4k
  1. If you don't want to run TopHat for all files separately, you can try to make a loop.
  2. Output folder for each run would be handy yes, since the names in the folder will all be the same.
  3. Don't know if this is an issue with your # of threads.
  4. hg18 is not hg38!
ADD COMMENT
0
Entering edit mode

Thank you b.nota. Question number 3: I had threads = 8 and got the error. I looked for others who had this issue, and they suggested the single thread to solve it. It then ran without any problems, but took ~ 6 hours.

Thanks for reminding me that hg18 is not hg38. Can you shine any light on the warning that I got, and what TH2 is looking for? I assumed it was a full reference (non-bowtie2 indexed) genome, but I wasn't sure. If that is the case, does that need to be in the same directories as my indexes?

Lastly, it sounds like you are saying that this does need to be 4 runs separately, but that I could loop them in sequence with a script. I wanted to make sure that TH2 couldn't align all 4 at once somehow more efficiently before I submitted 4 separate jobs.

Thanks again.

ADD REPLY
0
Entering edit mode

I am not an expert in TopHat, just another user, but I always use the maximum of threads available on my machine. Never get errors like yours...

There should be a genome.fa or hg38.fa file in your index folder, or a link to it. If you downloaded it from igenome website it should be in there.

I always use a loop when I want to map my fq files all at once (in one go that is). You can try to map them in parallel, I never tried that.

ADD REPLY
0
Entering edit mode

You could also run bowtie2-inspect on your index and safe the output as $index.fa:

bowtie2-inspect myindex > myindex.fa

This way, you get exactly the same naming of chromosomes as in your index, which is not guaranteed if you download a fasta file.

Additionally, Tophat2 results better alignment rates, if you provide the transcriptome index (see the corresponding section here).

ADD REPLY

Login before adding your answer.

Traffic: 1882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6