Question

Confusion: Whether to proceed further with Ion Proton exome data

0

Entering edit mode

9.5 years ago

venu 7.1k

Dear all,

I am dealing with some Ion Proton exome data. The average quality values of each read are very low (all are below 30) and many tests in FastQC report were failed. I've found some information about Ion proton quality values here. I was expecting a good amount of data to pass a filter of Q15 (and also Q10) but I didn't get much ( <1% out of >2.5m reads). I've used prinseq to filter out low quality reads (and also removed reads below the length of 70). I was stuck with following questions.

Is it a problem with the machine or our data is contaminated during library preparation?
Is it appropriate to proceed to downstream analysis by trimming the low quality bases of each read at the both ends?
If the data is of very low quality what should I do with the data (Is it waste of time to go for further analysis?)

I would like to know the views of people who dealt with Ion Proton data previously and I can also provide if any details required.

Ion-proton quality • 2.8k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.5 years ago by venu 7.1k

0

Entering edit mode

Do u have box plot of per base quality ?

ADD REPLY • link 9.5 years ago by GouthamAtla 12k

0

Entering edit mode

Here is how it looks

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by venu 7.1k

0

Entering edit mode

This does not look like really bad, how are you processing for QC ? what command are you using to filter ?

ADD REPLY • link 9.5 years ago by GouthamAtla 12k

0

Entering edit mode

After checking the FastQC, I've filtered using prinseq giving min_len value 70 and min_qual_score 25. Instead of removing all reads which are of low quality, I trimmed low quality bases at the 3' and 5'. Then FastQC report is a little better than the previous.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by venu 7.1k

score 2 · Answer 1 · 2015-10-09

2

Entering edit mode

9.5 years ago

h.mon 35k

You could use BBMap to map some samples (without quality trimming the reads) to your reference genome and then check its output statistics. BBMap has several options for output statistics, the most interesting for you would probably be "idhist", "ehist" and "indelhist" - check its README file or call bbmap.sh without parameters to see all options.