HI,
I am trying to trim primers off of fastq reads using the script below. It needs to trim off the primer (sub1, sub2, etc) and everything in front of the primer. I'm getting the error:
AttributeError: 'SeqRecord' object has no attribute 'replace'
Does anyone know how to fix this? I also need to expand it to include all primers instead of just sub1.
from Bio import SeqIO from os import listdir from os.path import isfile, join import sys
def main():
path = sys.argv[1]
onlyfiles = [f for f in listdir(path) if isfile(join(path,f))]
for fil in onlyfiles:
source = fil
outfile = "CutPrimers" + fil
sub1 = 'CCTACGGGAGGCTGCAG'
sub2 = 'CCTACGGGTGGCAGCAG'
sub3 = 'CCTACGGGAGGCAGCAG'
sub4 = 'CCTACGGGGGGCTGCAG'
sub5 = 'CCTACGGGCGGCAGCAG'
sub6 = 'CCTACGGGCGGCTGCAG'
sub7 = 'CCTACGGGGGGCAGCAG'
sub8 = 'CCTACGGGTGGCTGCAG'
sub9 = 'CCTACGGGTGGCGGCAG'
sub1b = 'CATACGGGAGGCTGCAG'
sub2b = 'CATACGGGTGGCAGCAG'
sub3b = 'CATACGGGAGGCAGCAG'
sub4b = 'CATACGGGGGGCTGCAG'
sub5b = 'CATACGGGCGGCAGCAG'
sub6b = 'CATACGGGCGGCTGCAG'
sub7b = 'CATACGGGGGGCAGCAG'
sub8b = 'CATACGGGTGGCTGCAG'
sub9b = 'CATACGGGTGGCGGCAG'
sub1c = 'CGTACGGGAGGCTGCAG'
sub2c = 'CGTACGGGTGGCAGCAG'
sub3c = 'CGTACGGGAGGCAGCAG'
sub4c = 'CGTACGGGGGGCTGCAG'
sub5c = 'CGTACGGGCGGCAGCAG'
sub6c = 'CGTACGGGCGGCTGCAG'
sub7c = 'CGTACGGGGGGCAGCAG'
sub8c = 'CGTACGGGTGGCTGCAG'
sub9c = 'CGTACGGGTGGCGGCAG'
sub1d = 'CTTACGGGAGGCTGCAG'
sub2d = 'CTTACGGGTGGCAGCAG'
sub3d = 'CTTACGGGAGGCAGCAG'
sub4d = 'CTTACGGGGGGCTGCAG'
sub5d = 'CTTACGGGCGGCAGCAG'
sub6d = 'CTTACGGGCGGCTGCAG'
sub7d = 'CTTACGGGGGGCAGCAG'
sub8d = 'CTTACGGGTGGCTGCAG'
sub9d = 'CTTACGGGTGGCGGCAG'
sub10 = 'GACTACACGGGTATCTAATCC'
sub11 = 'GACTACCAGGGTATCTAATCC'
sub12 = 'GACTACAAGGGTATCTAATCC'
sub13 = 'GACTACTGGGGTATCTAATCC'
sub14 = 'GACTACTCGGGTATCTAATCC'
sub15 = 'GACTACTAGGGTATCTAATCC'
sub16 = 'GACTACCCGGGTATCTAATCC'
sub17 = 'GACTACAGGGGTATCTAATCC'
sub18 = 'GACTACCGGGGTATCTAATCC'
seqs = SeqIO.parse(source, 'fastq')
records=[seq for seq in seqs if seq.replace(*CCTACGGGAGGCTGCAG,'',1) for seq in SeqIO.parse(source, "fastq")]
SeqIO.write(records, outfile, "fastq")
main()
Hello, you are attempting to do it yourself because you are learning python? Or you do want to trim adapters from your sequences? If it's the second case, you should consider using already written programs like Cutadapt (Python). If it's the first case, we will help you building your script. :)
I need to trim primers, not adapters. I've tried adapter trimming programs like CutAdapt, and they leave weird k-mers at the beginning of the trimmed sequence, so I'd like to do this with a custom script.
As long as you provide a core sequence for primer, allow for hamming distance changes (substitutions, 1 or more) you should be able to trim the primers (they are just oligos like adpaters) using a trimming program. I suggest you try
bbduk.sh
from BBMap suite. Put the core sequences you want to scan in a file in multi-fasta format. Check this page for help.