Hi I am going to make a reliable draft genome of a cultured bacteria. This is my FastQC report on Miseq reads from a genome sequencing project of bacteria prior to any trimming:
http://www.yumpu.com/en/document/view/58094396/r1-a
The original sequencing is a paired end but to focus on my problem, here I just put the QC of the forward (-->) direction. the other one has the same problem too. Quality trimming based on Q20 with Sickle, gives this:
https://www.yumpu.com/en/document/view/58094466/r1-b
Which I thought was not fine yet, so I used FastX and awk command to trim for 18n from the beginning and after 200 n length form the end of all reads and 194 n for the problematic tile 2117. then I applied the q20 and other default setting for final trimming with Sickle. this is the out put:
https://www.yumpu.com/fr/document/view/58094503/r1-c
Isn't that too much of trimming? is this necessary to go that far? there are still warnings in some parts like sequence duplication level or kmer content. should I go further to solve this warnings? are they important in my case?
Thanks everybody!
FastQC is overly protective, and most often warnings aren't something you should really worry about.
I think you have a typical quality profile for reads 2x300, I use prinseq-lite to trimming my reads by quality mean and setting to max lenght of 250 bases, and my assembly was fine (I use spades). I think that was enough, and could works for you, but it also depends on your read depth coverage.
thank you but did you have a look at the reports? Shouldn't I worry about parameters like duplication rate or per base sequence content etc.?
But whats the problem? It could also depends of the genome complexity, do you know how much complex is it? (or expected to be?) if you want, you can also filter reads of low linguistic complexity or high entropy, but I donĀ“t know if this is recommended for de novo assemblies. I would not do it.