How To Handle Ns In The Middle Of Reads
1
4
Entering edit mode
11.0 years ago
kautilya ▴ 430

For my illumina data fastqc shows presence of N's at positions 13,14,15 in 101 bp longs reads. If I go for cropping first 15 bases by using trimmomatic, it solves the problem but I lose a lot of data. I wanted to know that if I retain the N's what sort of problems would they cause during alignment(bwa+stampy)/variant calling(unified genotyper) and how can I handle these problems?

If any body faced a similar problem how did you handle it?

Similar questions asked on different forums but none has answered.

Could not find a resourse on how variant calling programs handle N's. Do they ignore them? Or consider them as a variation with low confidence scores?

Following is the image for per base n content from fastqc http://i43.tinypic.com/sfyz5z.jpg

fastqc qc • 3.6k views
ADD COMMENT
1
Entering edit mode

Shouldn't you first investigate why you got those weird Ns at these positions?

ADD REPLY
1
Entering edit mode

These are possibly due to machine read errors during sequencing. These are particular to only 1 of 3 runs. Looking for a way of handling these without losing a lot of sequence data.

ADD REPLY
2
Entering edit mode
9.7 years ago
Gabriel R. ★ 2.9k

If you want to do a BWA followed by GATK, I would use your reads as is. They likely have a base quality of 0 and GATK overlooks them. BWA will substitute them for random bases but fallacious alignments induced by those bases will be rare.

The cause? If this is Illumina, sometimes the reagents do not make it to the flowcell for a cycle or two due to pump problems or air bubbles.

ADD COMMENT

Login before adding your answer.

Traffic: 2148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6