Hi all,
I am dealing with DNA sequencing data for couple of samples. The data we received is called as PF data.
I was told these PF data can be processed to clean data, by removing reads with many N or low quality.
I am wondering if there is existing tools that can I can use to get this clean data?
Thanks,
Junfeng
Did you check this thread, especially you might be interested in "prinSeq"
Looking For Reliable Tools To Do Quality Filtering Of Fastq Files
Depending on what you want to do with the data, chances are that you can just use it as-is without any "cleaning".
We performe the sequencing on XTen, getting Q30 around 82%, which was expected larger than 90%.
The BBMap Clumpify tool has been very useful for getting rid of Illumina platform-specific optical duplicates and tile-edge duplicates.
Hello!
You can use the command-line tools
sikle
andseqtk
for trimming you files by quality. Then, you can visualize your trimmed vs non-trimmed data using the Bioconductor packageqrqc
.You can check their manuals to figure out how they work (they are not difficult for simple tasks).