Hello,
I have a raw, unaligned fastq.gz file that I am trying to preprocess using Biopython before alignment. I would ultimately like to remove low quality reads, trim polyA tails, trim adapters using fuzzy matching, and finally remove reads that do not satisfy a length requirement after all said preprocessing. It would also be neat to specify how many reads satisfy the filtering criteria at each step. I have been playing around with this biopython scripts but have had little success. I believe the quality filter and polyA trimming works correctly but I cannot seem to get the adapters to cut. I have also wrote a function called get_stats
that is suppose to return the average length and total reads. I would appreciate any help!
Why do you want to invent the wheel? http://prinseq.sourceforge.net/