Trim Primers from FASTQ file
1
0
Entering edit mode
7.2 years ago

HI,

I am trying to trim primers off of fastq reads using the script below. It needs to trim off the primer (sub1, sub2, etc) and everything in front of the primer. I'm getting the error:

AttributeError: 'SeqRecord' object has no attribute 'replace'

Does anyone know how to fix this? I also need to expand it to include all primers instead of just sub1.

from Bio import SeqIO from os import listdir from os.path import isfile, join import sys

def main():

path = sys.argv[1]
onlyfiles = [f for f in listdir(path) if isfile(join(path,f))]
for fil in onlyfiles:
    source = fil
    outfile = "CutPrimers" + fil
    sub1 = 'CCTACGGGAGGCTGCAG'
    sub2 = 'CCTACGGGTGGCAGCAG'
    sub3 = 'CCTACGGGAGGCAGCAG'
    sub4 = 'CCTACGGGGGGCTGCAG'
    sub5 = 'CCTACGGGCGGCAGCAG'
    sub6 = 'CCTACGGGCGGCTGCAG'
    sub7 = 'CCTACGGGGGGCAGCAG'
    sub8 = 'CCTACGGGTGGCTGCAG'
    sub9 = 'CCTACGGGTGGCGGCAG'
    sub1b = 'CATACGGGAGGCTGCAG'
    sub2b = 'CATACGGGTGGCAGCAG'
    sub3b = 'CATACGGGAGGCAGCAG'
    sub4b = 'CATACGGGGGGCTGCAG'
    sub5b = 'CATACGGGCGGCAGCAG'
    sub6b = 'CATACGGGCGGCTGCAG'
    sub7b = 'CATACGGGGGGCAGCAG'
    sub8b = 'CATACGGGTGGCTGCAG'
    sub9b = 'CATACGGGTGGCGGCAG'
    sub1c = 'CGTACGGGAGGCTGCAG'
    sub2c = 'CGTACGGGTGGCAGCAG'
    sub3c = 'CGTACGGGAGGCAGCAG'
    sub4c = 'CGTACGGGGGGCTGCAG'
    sub5c = 'CGTACGGGCGGCAGCAG'
    sub6c = 'CGTACGGGCGGCTGCAG'
    sub7c = 'CGTACGGGGGGCAGCAG'
    sub8c = 'CGTACGGGTGGCTGCAG'
    sub9c = 'CGTACGGGTGGCGGCAG'
    sub1d = 'CTTACGGGAGGCTGCAG'
    sub2d = 'CTTACGGGTGGCAGCAG'
    sub3d = 'CTTACGGGAGGCAGCAG'
    sub4d = 'CTTACGGGGGGCTGCAG'
    sub5d = 'CTTACGGGCGGCAGCAG'
    sub6d = 'CTTACGGGCGGCTGCAG'
    sub7d = 'CTTACGGGGGGCAGCAG'
    sub8d = 'CTTACGGGTGGCTGCAG'
    sub9d = 'CTTACGGGTGGCGGCAG'
    sub10 = 'GACTACACGGGTATCTAATCC'
    sub11 = 'GACTACCAGGGTATCTAATCC'
    sub12 = 'GACTACAAGGGTATCTAATCC'
    sub13 = 'GACTACTGGGGTATCTAATCC'
    sub14 = 'GACTACTCGGGTATCTAATCC'
    sub15 = 'GACTACTAGGGTATCTAATCC'
    sub16 = 'GACTACCCGGGTATCTAATCC'
    sub17 = 'GACTACAGGGGTATCTAATCC'
    sub18 = 'GACTACCGGGGTATCTAATCC'
    seqs = SeqIO.parse(source, 'fastq')
    records=[seq for seq in seqs if seq.replace(*CCTACGGGAGGCTGCAG,'',1) for seq in SeqIO.parse(source, "fastq")]
    SeqIO.write(records,  outfile, "fastq")

main()

python primer trimming sequencing fastq • 3.3k views
ADD COMMENT
0
Entering edit mode

Hello, you are attempting to do it yourself because you are learning python? Or you do want to trim adapters from your sequences? If it's the second case, you should consider using already written programs like Cutadapt (Python). If it's the first case, we will help you building your script. :)

ADD REPLY
0
Entering edit mode

I need to trim primers, not adapters. I've tried adapter trimming programs like CutAdapt, and they leave weird k-mers at the beginning of the trimmed sequence, so I'd like to do this with a custom script.

ADD REPLY
0
Entering edit mode

As long as you provide a core sequence for primer, allow for hamming distance changes (substitutions, 1 or more) you should be able to trim the primers (they are just oligos like adpaters) using a trimming program. I suggest you try bbduk.sh from BBMap suite. Put the core sequences you want to scan in a file in multi-fasta format. Check this page for help.

ADD REPLY
0
Entering edit mode
7.2 years ago
chen ★ 2.5k

If your data is Illumina pair-end sequenced, you can use AfterQC (https://github.com/OpenGene/AfterQC) to trim adapters, without requirement to input the adapter sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 2268 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6