Coverage of 320; Would it be useful to perform very stringent filtering?
1
0
Entering edit mode
6.8 years ago
Benni ▴ 30

I have Illumina data of a 6Mb bacteria. Insert length = 400, read length = 151, estimated coverage = 320 (read length * #sequences / 6Mb). I think I donĀ“t need to use all the data. I could randomly sample 25% or so. But I could also filter sequences very stringent based on quality. Do you think it would improve my assembly?

I want to assemble high quality contigs. There are many repetitive sequences that I probably need to identify myself afterwards.

ngs assembly bacteria • 1.0k views
ADD COMMENT
1
Entering edit mode
6.8 years ago

My 2 cents: always use all the data you have. Perhaps it's because I work with an organism where data is scarce, but I wouldn't use a subset.

A high quality filtering definitely helps in improving assemblies, but I wouldn't set the bar higher than "normal": you would discard a lot of material that is good.

I want to assemble high quality contigs.

The more data you provide, the higher the chances that you get into your data also those DNA regions that have been sequenced less times (maybe they were not easy to access).

There are many repetitive sequences [...]

In a bacteria?

ADD COMMENT

Login before adding your answer.

Traffic: 1387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6