How to align Trimmomatic unpaired reads with BWA?
2
0
Entering edit mode
9.6 years ago
mcff23 ▴ 60

Hi everyone!

I have filtered the adapters from my Illumina PE reads with Trimmomatic. This was the output (as I expected):

sample.R1.trimmed.fastq
sample.R2.trimmed.fastq
sample.R1.unpaired.fastq
sample.R2.unpaired.fastq

Then I aligned the trimmed.fastq pair with BWA just fine. But when I tried to align the unpaired reads I got this:

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (4, 1, 1, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "HWI-1KL178:67:HAE0RADXX:1:1101:2363:2000", "HWI-1KL178:67:HAE0RADXX:1:1101:11567:2000"

This is the command line:

bwa/bin/bwa mem -aM -t 6 ${REF_BWA_INDEX}/genome.fa ${SAMPLE}.R1.unpaired.fastq ${SAMPLE}.R2.unpaired.fastq > ${i}.sam

My goal is to align trimmed and unpaired files separately because BWA do not support them together.

Thanks in advance!

Monica

Trimmomatic BWA Unpaired-reads • 10k views
ADD COMMENT
6
Entering edit mode
9.6 years ago

Run each unpaired data separately.

bwa/bin/bwa mem -aM -t 6 ${REF_BWA_INDEX}/genome.fa ${SAMPLE}.R1.unpaired.fastq >R1.unpaired.sam
..

Be careful with combining paired and unpaired data.

Information gleaned from a read pair usually cannot (should not) be combined with that obtained from two unpaired reads. That is because a paired read provides measurements from the same DNA fragment that is measured (sequenced) twice, whereas unpaired reads measure different DNA fragments.

ADD COMMENT
2
Entering edit mode

Just a note that the latest bwa-mem supports this:

(seqtk mergepe sample.R?.trimmed.fastq; cat sample.R?.unpaired.fastq) | bwa mem -p -

i.e., you can merge paired and unpaired reads in one stream, as long as paired reads are next to each other.

ADD REPLY
0
Entering edit mode

Thanks Istvan for your quick response!

I am kind of lost. My main goal here is to call variants, what do yo suggest me to do with these unpaired files once I aligned them separately? I was going to merge them with the trimmed ones and then call the variants...

Do I have to take them into account or I should only use the trimmed ones?

Thanks!

Monica

ADD REPLY
2
Entering edit mode

Check the documentation of the variant caller for information on whether it handles mixed content. We usually discard the unpaired reads to keep things simple but typically these are no more than a few percent of data - won't actually affect the results.

ADD REPLY
0
Entering edit mode

Hi Istvan,

Would you please give a general number for "a few percent"? I filtered out 8% unpaired reads. Will this amount of data loss affect the downstream analysis?

Thank you!

ADD REPLY
0
Entering edit mode

8% is not all that much but then it all depends how much data do you have left. The general rule is that it is best to get rid of bad data than to try to salvage it. in my opinion better data even if it is fewer is more desirable than salvaged data.

That is because errors rarely come isolated - we may think that we were able fix all that by trimming off the bad bases but perhaps there were more reasons that drove those errors in some regions of the flowcell and even the data that looks reliable is not.

ADD REPLY
0
Entering edit mode
5.7 years ago

If your unpaired reads are being generated by Trimmomatic's pallindromic mode (i.e. If forward and reverse reads end up containing the same sequence after trimming adapters), try using the "keepBothReads" function of ILLUMINACLIP

ADD COMMENT

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6