biopython script to preprocess raw RNA-seq reads (quality filtering, polyA and adapter trimming)
2
0
Entering edit mode
8.7 years ago
shanasabri ▴ 40

Hello,

I have a raw, unaligned fastq.gz file that I am trying to preprocess using Biopython before alignment. I would ultimately like to remove low quality reads, trim polyA tails, trim adapters using fuzzy matching, and finally remove reads that do not satisfy a length requirement after all said preprocessing. It would also be neat to specify how many reads satisfy the filtering criteria at each step. I have been playing around with this biopython scripts but have had little success. I believe the quality filter and polyA trimming works correctly but I cannot seem to get the adapters to cut. I have also wrote a function called get_stats that is suppose to return the average length and total reads. I would appreciate any help!

RNA-Seq Biopython sequence • 2.9k views
ADD COMMENT
0
Entering edit mode

Why do you want to invent the wheel? http://prinseq.sourceforge.net/

ADD REPLY
1
Entering edit mode
8.7 years ago
dr_bantz ▴ 110

I'm not sure why you would want to do this in python (if nothing else it would take ages). The bbduk utility from the bbmap suite would do all you need. Here's a thread with some info:

http://seqanswers.com/forums/showthread.php?t=42776

ADD COMMENT
1
Entering edit mode
8.7 years ago
ablanchetcohen ★ 1.2k

I don't understand either why you feel the need to write your own tool. If you're doing this as a programming exercice, your question should be more precise.

Here is a partial list of the existing trimming tools provided by Wikipedia. https://en.wikipedia.org/wiki/List_of_RNA-Seq_bioinformatics_tools#Trimming_and_adapters_removal

BBDuk clean_reads condetri cutadapt Deconseq Erne-Filter FastqMcf FASTX Flexbar FreClu htSeqTools NxTrim PRINSEQ Sabre Scythe SEECER Sickle SnoWhite ShortRead TagCleaner Trimmomatic

ADD COMMENT
1
Entering edit mode

This is an exercise and I'd like to build my own toolkit for my own analysis so that I know exactly what is happening behind the scenes.

ADD REPLY

Login before adding your answer.

Traffic: 1642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6