Question

quality control with the 16s miseq data

0

Entering edit mode

10.2 years ago

hua.peng1314 ▴ 100

Hi everyone I am dealing with some miseq data,I find the quality of some reads is not good enough.I used make.config(Mothur) command and the PandaSeq software to merge the pair_end reads,I just get a fasta file but no quality file to do quality control.I try to do QC for the pair end reads respectively before merge them.But trim the low quality base of the end maybe make the reads have no overlap.Thanks for suggestion.

miseq 16s • 5.2k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by hua.peng1314 ▴ 100

1

Entering edit mode

based on my experience FLASH beats make.config

ADD REPLY • link 10.2 years ago by Quak ▴ 520

Ram · Answer 1 · 2014-10-24

0

Entering edit mode

10.2 years ago

marina.v.yurieva ▴ 580

You might want to join them with something like flash (it gives fastq output so you will still have all the quality scores) and only then trim them. You can also try several trimmers - the results sometimes are quite different. I usually prefer bwa trimmer but there are many others as well. I'm not sure if Mothur's make.config will give you the best merged reads. I compared it with my merging - qc trimming pipeline and it gave better results (was also on miseq) but it all depends on the quality of your data. If the quality is good then use whatever. Mine had very low quality at the ends so I had to try many things to make it work.

ADD COMMENT • link 10.2 years ago by marina.v.yurieva ▴ 580

0

Entering edit mode

Thanks for your insights.I have also use make.config and flash with the default parameter. I find the result of both is similar in most cases.And in one case I used miseq 2X300 to sequence the 16s v4-v6(530-1100),I did not delete the last 50 base of R2 reads;and the result of mothur is:

       raw reads     <540     540~590     >590
M1     63792         6663     35127       22600
M2     116539        12847    64723       40096
M3     142317        15702    81105       46874
M4     134210        12796    70243       52208
M5     52356         4229     29066       19520
M6     44229         3917     24038       16620

and the result of flash almost all length is 598~599(I had to set the minoverlap 1),In some ways it seems mothur is better than flash in some case.But I just do it for one time.

Now I have some other questions.What should I do to deal with my stupid work about the v4-v6. It seems it's too long for miseq and I get so many OTUs. No doubt I get a real bed rarefaction curve.But the tops of the community multiplicity is similar to the result of R1 reads. What should I do? More stringent quality control? Delete the low abundance OTUs?Any suggestions?

The another problem is our boss try to sequence about 100 samples and 100000 PE reads per sample. As qiime is too hard to be installed and used for a flasher,And I am not sure about the result of UPARSE. So mothur is my choice. I can use cd-hit or USEARCH to generate OTU. But the classify.seq takes too much time with SILVA.And also the chimera.uchime. UPARSE recommends the gold database and not using a large 16S database like Greengenes. But I am not sure because it seems to have do Chimera filtering during clustering OTU and Chimera filtering again with reference database after clustering OTU.

Any better and faster ways? Thanks.

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 10.2 years ago by hua.peng1314 ▴ 100

0

Entering edit mode

I don't think you would get too many OTUs because your reads are "too long"... And what is too many OTUs anyway? You can try to do rarefactions and look at the numbers.

Try flash, then quality trimming and if you are concerned about the time use qiime. Maybe somebody can help you to install it. It works MUCH faster than mothur.

I haven't used Greengenes so cannot say anything about it, only Silva.

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 10.2 years ago by marina.v.yurieva ▴ 580

0

Entering edit mode

Any suggestion include qiime and other software.Since qiime I have not installed successfully.The method in qiime such as RDP classifier I have not tested but I will try.

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 10.2 years ago by hua.peng1314 ▴ 100

0

Entering edit mode

Thanks for your kindness and patients. For English is not my mother tongue and I have not spent enough time on English.It's a little hard for me to describe my problem clearly.

Maybe the 570bp insert is too long for miseq 2x300,the quality of the overlap is too low and flash can not merge it. With default parameters flash give too little merged reads back.

If I set the minoverlap to 1, The length of merged reads I get is 598~599.

I used make.config and the result has be showed above.At least the number and length of the merged reads seems more normal.But the rarefaction curve are not smooth enough, in fact they just like real lines between two points without any turn.It seems to have too much OTUs(fake OTU) compare the result of 454 I done before.

I know qiime can be faster for some steps,but not sure about the classify and Chimera filtering step.As it has many method and parameters and I am a lazy person. I will try them but if anyone has already insight of it will be appreciated.Forgive my laziness.

Actually I also just use the SILVA, but both Greengenes and SILVA are large,The gold database recommends by Robert Edgar is smaller but is used for UPARSE. I am not sure if it suitable for mothur.

May be I should open a new post.

Thanks again.

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 10.2 years ago by hua.peng1314 ▴ 100