Overlapping Pe Reads Alignments
1
2
Entering edit mode
12.0 years ago
kanwarjag ★ 1.2k

I have longer (150) PE read data of DNAseq and would like to find seq variation as well as align to known genome of bacteria. The problem is these longer reads are overlapping, so what I am looking for is a GUI based tool which can convert these PE to single end and then I can align to reference genome (~6K). I am also open to any other suggestion. Can I use Bowtie 2 or BWA-SW to such longer reads for alignments. Thanks

• 7.3k views
ADD COMMENT
0
Entering edit mode

Why is it a problem that the forward and reverse reads overlap? Is it an issue that the overlapping part has more representation than the outer ends of the reads?

ADD REPLY
0
Entering edit mode

Yes, that's the typical worry. If you assume there was (say) a substitution error early on in PCR amplification, that would be propagated to all copies of the molecule on the flowcell. If that error was in the overlapping portion, both reads would see it and subsequent algorithms would give (spuriously) extra weight to this error.

The shorter story is that sometimes it breaks the independence assumption of the error model that most tools follow.

ADD REPLY
0
Entering edit mode

Matted, You have rightly summarized the issue. Do you have any suggestion to solve this problem. Sorry being little impatient.

Thanks

ADD REPLY
0
Entering edit mode

BWA as such failed when I used it for aligning. Any other suggestion, It will be very easy if I can align to a reference without any manipulation of data.However, it has been suggested that overlapping reads need to be merged first (http://thegenomefactory.blogspot.com/2012/11/tools-to-merge-overlapping-paired-end.html).

ADD REPLY
1
Entering edit mode

It seems like the link you supply lists a few competent tools for doing this. FLASH seems worth a shot. Alternatively, can you just use only the forward mate pairs for a preliminary analysis?

ADD REPLY
0
Entering edit mode

Overlapping ends have existed since the first day of Solexa sequencing. Aligning overlapping ends is never a problem. The problem is at the SNP calling stage, you would want to collapse the overlap to reduce false SNPs caused by PCR errors.

ADD REPLY
0
Entering edit mode

Ih3 and Matt,

Just to follow up; I used BWA as well as Bowtie to see if I can align either forward read or with both PE reads. Most of the alignments (99%) is giving score of 77 and 141- none of reads are mapped. I am aligning for bacterial genome

ADD REPLY
0
Entering edit mode

Do you mean "flag"? Flag 77/141 indicates the read is unmapped, which has nothing to do with overlapping ends. It is more likely that your reads are of poor quality or you are using a wrong reference. Why your bacterial genome is only 6kb long?

ADD REPLY
0
Entering edit mode

I remember hearing or reading that GATK will handle this the right way automatically, but samtools will not. I'd be happy to learn if that's true or not...

ADD REPLY
0
Entering edit mode

What you said is true.

ADD REPLY
0
Entering edit mode
12.0 years ago
Aqua ▴ 10

You can connect the reads using COPE or FLASH (Both published in Bioinformatics). And align the connected reads to the reference genome using BWA-SW. But be aware that not every read is able to be connected due the tandem repeats or low complexity sequences exist at the 3'-end of a read.

ADD COMMENT
1
Entering edit mode

To save others the few minutes it took me to track the papers down, here they are:

"COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly"

http://bioinformatics.oxfordjournals.org/content/28/22/2870

"FLASH: fast length adjustment of short reads to improve genome assemblies"

http://bioinformatics.oxfordjournals.org/content/27/21/2957

Thanks for the pointers; I didn't know about these tools.

ADD REPLY
0
Entering edit mode

Is either of them is GUI based or all are command based? Thanks

ADD REPLY
0
Entering edit mode

They are command based.

ADD REPLY

Login before adding your answer.

Traffic: 1512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6