Estimating Empirical Error Rate In Illumina Sequencing Data
1
3
Entering edit mode
11.8 years ago
Abhi ★ 1.6k

Hey Guys

Of late we have been seeing a some wavy N patterns in illumina data. By this I mean a pattern of N's across the read length. To better understand we want to empirically estimate the error rates using the control data we have for some of the runs.

Two specific questions:

  1. Is there an existing method which takes the mapped bam/sam file and converts the MD flag into a graph of estimated error rates per read position? We just want to look at the percent mismatch bases per bp of the reads. Indels could be binned separately.
  2. Also It has been a while since I did PhiX mapping. Just wondering 90-95% mapped reads are on the expected lines. I have part of the memory that reminds me that the % may be close to 99. We are also looking are unmapeed reads to see what might be going on with them.

Thanks!
-Abhi

qualitycontrol illumina quality ngs • 4.5k views
ADD COMMENT
0
Entering edit mode

How was your phix put there ? Spiked in with indices ?

ADD REPLY
0
Entering edit mode

For the runs we are looking to check we had a full lane of PhiX..so no indices or spike in.

ADD REPLY
0
Entering edit mode
11.5 years ago
JC 13k

You can check if you have position specific errorsand other features with tools such as FastQC.

ADD COMMENT

Login before adding your answer.

Traffic: 1638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6