Converting Tophats Bam Output Back To Separate Paired End Read Fastq Files
1
0
Entering edit mode
12.0 years ago
bob-lowlow ▴ 40

Hi all,

I was wondering if anyone could offer me some advice on using paired end reads with Tophat, specifically with the output. I'm planning on using Tophat as part of a pipeline for processing my sequence data. The reads that map are obviously going to be easy to deal with, but the unmapped.bam file is proving a bit problematic. I would like to get that bam file back to two fastq files containing the paired reads which didn't map to the reference genome (hg19 in this case). What I was thinking was to convert to sam, and then use Picard's SamToFastq function, but that is returning the following error

MAPQ must be zero if RNAME is not specified;

Which I haven't been able to find anything about online. I'm also not sure how time consuming this will be. I'm currently just playing around with a random sample of my data just trying to get everything working, but my actual data files are probably going to be 20gb + at least in fastq format anyway.

I was also thinking of converting the accepted_hits.bam file to sam and then writing a unix script which would take the files which were input into tophat and write any read which isn't present in the accepted_hits file into 2 new files.

What do you think?

tophat sam RNA-seq bam • 2.8k views
ADD COMMENT
2
Entering edit mode
12.0 years ago

Try setting VALIDATION_STRINGENCY=SILENT for SamToFastq.

ADD COMMENT
0
Entering edit mode

Thank you very much sir.

ADD REPLY
0
Entering edit mode

Or LENIENT. LENIENT it will still work but you'll get the warnings.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6