Question

Remove Singletons From Trimmed Fastq Files

3

Entering edit mode

11.8 years ago

Alex Paciorkowski 3.5k

How are you removing singletons from whole exome data fastq files from which the adapter sequences have already been removed? We have some data where the paired-end files do not match up, and we believe this is due to singletons that need to be removed. What tools are currently out there?

Have checked Trimming Algorithm but do not see this specifically addressed.

Much thanks.

fastq • 7.9k views

ADD COMMENT • link updated 11.8 years ago by Rm 8.3k • written 11.8 years ago by Alex Paciorkowski 3.5k

1

Entering edit mode

Did you see this thread: How to sort two mate pair (fastq) files so that the order of the identifiers is the same? ? Are you still using a trimmer that can't natively handle paired-end reads or is this just an older dataset? If the former, you might consider just switching trimmers.

ADD REPLY • link 11.8 years ago by Devon Ryan 105k

0

Entering edit mode

Looks promising, will take a look.

ADD REPLY • link 11.8 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

Are the two paired files sorted in the same order excluding singletons?

ADD REPLY • link 11.8 years ago by Damian Kao 16k

Ram · Answer 1 · 2013-07-11

4

Entering edit mode

11.8 years ago

Gabriel R. ★ 2.9k

We solved the problem in a very simple way, ditch fastq, use BAM even for unaligned reads and let the flags do their magic.

ADD COMMENT • link 11.8 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

Can you elaborate? I don't understand what this means. How are you mapping BAMs?

ADD REPLY • link 11.8 years ago by Alex Paciorkowski 3.5k

1

Entering edit mode

You are not mapping the BAMs. You merely convert from fastq to BAM and working on those as your raw unmapped data. This will save you a lot of trouble of knowing who is paired/not paired/read group info etc.

ADD REPLY • link 11.8 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

And then you convert back to fastq for mapping? I am still not sure what exactly you are doing. Which "flags" are you talking about? Can you please elaborate? Is there a workflow you can reference or point to as an illustration?

ADD REPLY • link 11.8 years ago by Alex Paciorkowski 3.5k

1

Entering edit mode

And then you convert back to fastq for mapping?

yes and no, we use file descriptors. So bowtie apparently still cannot read bam, so we call it like that:

bowtie2 -1 <(samtools view -f "0x40" -Y input.bam) -2 <(samtools view -f "0x80" -Y input.bam)

Use this custom version of samtools: https://github.com/udo-stenzel/samtools-patched

I am still not sure what exactly you are doing. Which "flags" are you talking about? Can you please elaborate?

Every read in a BAM file has binary flags combined into a single number. These flags tell us about whether the read is paired, mapped, properly paired, QC failed etc... see http://samtools.sourceforge.net/SAM1.pdf

Is there a workflow you can reference or point to as an illustration?

Unfortunately not really. I suggest being more familiar with the BAM format and regular unix concepts like pipe/file descriptors etc. Good luck and have fun!

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.8 years ago by Gabriel R. ★ 2.9k

score 1 · Answer 2 · 2013-07-11

1

Entering edit mode

11.8 years ago

Rm 8.3k

Try Sickle Paired End (sickle pe) for paired end trimming. OR

If already trimmed you use cmpfastq to get common and singletons in separate files.