Remove Singletons From Trimmed Fastq Files
2
3
Entering edit mode
11.5 years ago

How are you removing singletons from whole exome data fastq files from which the adapter sequences have already been removed? We have some data where the paired-end files do not match up, and we believe this is due to singletons that need to be removed. What tools are currently out there?

Have checked Trimming Algorithm but do not see this specifically addressed.

Much thanks.

fastq • 7.5k views
ADD COMMENT
1
Entering edit mode

Did you see this thread: How to sort two mate pair (fastq) files so that the order of the identifiers is the same? ? Are you still using a trimmer that can't natively handle paired-end reads or is this just an older dataset? If the former, you might consider just switching trimmers.

ADD REPLY
0
Entering edit mode

Looks promising, will take a look.

ADD REPLY
0
Entering edit mode

Are the two paired files sorted in the same order excluding singletons?

ADD REPLY
4
Entering edit mode
11.5 years ago
Gabriel R. ★ 2.9k

We solved the problem in a very simple way, ditch fastq, use BAM even for unaligned reads and let the flags do their magic.

ADD COMMENT
0
Entering edit mode

Can you elaborate? I don't understand what this means. How are you mapping BAMs?

ADD REPLY
1
Entering edit mode

You are not mapping the BAMs. You merely convert from fastq to BAM and working on those as your raw unmapped data. This will save you a lot of trouble of knowing who is paired/not paired/read group info etc.

ADD REPLY
0
Entering edit mode

And then you convert back to fastq for mapping? I am still not sure what exactly you are doing. Which "flags" are you talking about? Can you please elaborate? Is there a workflow you can reference or point to as an illustration?

ADD REPLY
1
Entering edit mode

And then you convert back to fastq for mapping?

yes and no, we use file descriptors. So bowtie apparently still cannot read bam, so we call it like that:

bowtie2 -1 <(samtools view -f "0x40" -Y input.bam) -2 <(samtools view -f "0x80" -Y input.bam)

Use this custom version of samtools: https://github.com/udo-stenzel/samtools-patched

I am still not sure what exactly you are doing. Which "flags" are you talking about? Can you please elaborate?

Every read in a BAM file has binary flags combined into a single number. These flags tell us about whether the read is paired, mapped, properly paired, QC failed etc... see http://samtools.sourceforge.net/SAM1.pdf

Is there a workflow you can reference or point to as an illustration?

Unfortunately not really. I suggest being more familiar with the BAM format and regular unix concepts like pipe/file descriptors etc. Good luck and have fun!

ADD REPLY
1
Entering edit mode
11.5 years ago
Rm 8.3k

Try Sickle Paired End (sickle pe) for paired end trimming. OR

If already trimmed you use cmpfastq to get common and singletons in separate files.

ADD COMMENT
0
Entering edit mode

This looks helpful as well -- thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6