Question

Coverage of 320; Would it be useful to perform very stringent filtering?

0

Entering edit mode

7.8 years ago

Benni ▴ 30

I have Illumina data of a 6Mb bacteria. Insert length = 400, read length = 151, estimated coverage = 320 (read length * #sequences / 6Mb). I think I don´t need to use all the data. I could randomly sample 25% or so. But I could also filter sequences very stringent based on quality. Do you think it would improve my assembly?

I want to assemble high quality contigs. There are many repetitive sequences that I probably need to identify myself afterwards.

ngs assembly bacteria • 1.2k views

ADD COMMENT • link updated 7.8 years ago by Matteo Schiavinato ★ 3.7k • written 7.8 years ago by Benni ▴ 30

score 1 · Answer 1 · 2018-02-06

My 2 cents: always use all the data you have. Perhaps it's because I work with an organism where data is scarce, but I wouldn't use a subset.

A high quality filtering definitely helps in improving assemblies, but I wouldn't set the bar higher than "normal": you would discard a lot of material that is good.

I want to assemble high quality contigs.

The more data you provide, the higher the chances that you get into your data also those DNA regions that have been sequenced less times (maybe they were not easy to access).

There are many repetitive sequences [...]

In a bacteria?