Question

Trim Primers from FASTQ file

0

Entering edit mode

7.2 years ago

anna.knight • 0

HI,

I am trying to trim primers off of fastq reads using the script below. It needs to trim off the primer (sub1, sub2, etc) and everything in front of the primer. I'm getting the error:

AttributeError: 'SeqRecord' object has no attribute 'replace'

Does anyone know how to fix this? I also need to expand it to include all primers instead of just sub1.

from Bio import SeqIO from os import listdir from os.path import isfile, join import sys

def main():

path = sys.argv[1]
onlyfiles = [f for f in listdir(path) if isfile(join(path,f))]
for fil in onlyfiles:
    source = fil
    outfile = "CutPrimers" + fil
    sub1 = 'CCTACGGGAGGCTGCAG'
    sub2 = 'CCTACGGGTGGCAGCAG'
    sub3 = 'CCTACGGGAGGCAGCAG'
    sub4 = 'CCTACGGGGGGCTGCAG'
    sub5 = 'CCTACGGGCGGCAGCAG'
    sub6 = 'CCTACGGGCGGCTGCAG'
    sub7 = 'CCTACGGGGGGCAGCAG'
    sub8 = 'CCTACGGGTGGCTGCAG'
    sub9 = 'CCTACGGGTGGCGGCAG'
    sub1b = 'CATACGGGAGGCTGCAG'
    sub2b = 'CATACGGGTGGCAGCAG'
    sub3b = 'CATACGGGAGGCAGCAG'
    sub4b = 'CATACGGGGGGCTGCAG'
    sub5b = 'CATACGGGCGGCAGCAG'
    sub6b = 'CATACGGGCGGCTGCAG'
    sub7b = 'CATACGGGGGGCAGCAG'
    sub8b = 'CATACGGGTGGCTGCAG'
    sub9b = 'CATACGGGTGGCGGCAG'
    sub1c = 'CGTACGGGAGGCTGCAG'
    sub2c = 'CGTACGGGTGGCAGCAG'
    sub3c = 'CGTACGGGAGGCAGCAG'
    sub4c = 'CGTACGGGGGGCTGCAG'
    sub5c = 'CGTACGGGCGGCAGCAG'
    sub6c = 'CGTACGGGCGGCTGCAG'
    sub7c = 'CGTACGGGGGGCAGCAG'
    sub8c = 'CGTACGGGTGGCTGCAG'
    sub9c = 'CGTACGGGTGGCGGCAG'
    sub1d = 'CTTACGGGAGGCTGCAG'
    sub2d = 'CTTACGGGTGGCAGCAG'
    sub3d = 'CTTACGGGAGGCAGCAG'
    sub4d = 'CTTACGGGGGGCTGCAG'
    sub5d = 'CTTACGGGCGGCAGCAG'
    sub6d = 'CTTACGGGCGGCTGCAG'
    sub7d = 'CTTACGGGGGGCAGCAG'
    sub8d = 'CTTACGGGTGGCTGCAG'
    sub9d = 'CTTACGGGTGGCGGCAG'
    sub10 = 'GACTACACGGGTATCTAATCC'
    sub11 = 'GACTACCAGGGTATCTAATCC'
    sub12 = 'GACTACAAGGGTATCTAATCC'
    sub13 = 'GACTACTGGGGTATCTAATCC'
    sub14 = 'GACTACTCGGGTATCTAATCC'
    sub15 = 'GACTACTAGGGTATCTAATCC'
    sub16 = 'GACTACCCGGGTATCTAATCC'
    sub17 = 'GACTACAGGGGTATCTAATCC'
    sub18 = 'GACTACCGGGGTATCTAATCC'
    seqs = SeqIO.parse(source, 'fastq')
    records=[seq for seq in seqs if seq.replace(*CCTACGGGAGGCTGCAG,'',1) for seq in SeqIO.parse(source, "fastq")]
    SeqIO.write(records,  outfile, "fastq")

main()

python primer trimming sequencing fastq • 3.3k views

ADD COMMENT • link 7.2 years ago by anna.knight • 0

0

Entering edit mode

Hello, you are attempting to do it yourself because you are learning python? Or you do want to trim adapters from your sequences? If it's the second case, you should consider using already written programs like Cutadapt (Python). If it's the first case, we will help you building your script. :)

ADD REPLY • link 7.2 years ago by glihm ▴ 660

0

Entering edit mode

I need to trim primers, not adapters. I've tried adapter trimming programs like CutAdapt, and they leave weird k-mers at the beginning of the trimmed sequence, so I'd like to do this with a custom script.

ADD REPLY • link 7.2 years ago by anna.knight • 0

0

Entering edit mode

As long as you provide a core sequence for primer, allow for hamming distance changes (substitutions, 1 or more) you should be able to trim the primers (they are just oligos like adpaters) using a trimming program. I suggest you try bbduk.sh from BBMap suite. Put the core sequences you want to scan in a file in multi-fasta format. Check this page for help.

ADD REPLY • link 7.2 years ago by GenoMax 147k

score 0 · Answer 1 · 2017-09-01

0

Entering edit mode

7.2 years ago

chen ★ 2.5k

If your data is Illumina pair-end sequenced, you can use AfterQC (https://github.com/OpenGene/AfterQC) to trim adapters, without requirement to input the adapter sequences.

ADD COMMENT • link 7.2 years ago by chen ★ 2.5k