Entering edit mode
3.2 years ago
vaishnavi
▴
80
Hi everyone,
I want to extract adapter contaminated reads from a fastq file using python code, but I am unable to do so.
Adapter sequence is: "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA"
File contains this data:
@HWUSI-EAS570R_0003:2:50:5038:17424#0/1
CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA
+HWUSI-EAS570R_0003:2:50:5038:17424#0/1
hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[
@HWUSI-EAS570R_0003:2:50:5175:17417#0/1
CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG
+HWUSI-EAS570R_0003:2:50:5175:17417#0/1
hhhhhhhhhhhfhhhhfaehhhhgahehhcghhfch
@HWUSI-EAS570R_0003:2:50:5442:17417#0/1
AGTTCGCCGACGTTTACGCCGCCTCGGTCCTCGGCA
+HWUSI-EAS570R_0003:2:50:5442:17417#0/1
ghhhhhhhhhhhhhhfhhhhhhhfhhgfhhgfgffc
@HWUSI-EAS570R_0003:2:50:5552:17421#0/1
AAGACATCAAACTACGAAACTACTACAAGAAAACAT
+HWUSI-EAS570R_0003:2:50:5552:17421#0/1
hghghhhhhhhhhghhhhhhghhhhhehhhhheg`h
@HWUSI-EAS570R_0003:2:50:5658:17415#0/1
GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAG
+HWUSI-EAS570R_0003:2:50:5658:17415#0/1
hhhhhfhghdhhhhhhhhhhhgghhfheffhdfcbf
@HWUSI-EAS570R_0003:2:50:5712:17421#0/1
TTTCTTTTACCCCTAATCCTATCAGCTTTTTCTCCC
+HWUSI-EAS570R_0003:2:50:5712:17421#0/1
hhhghhhhhhhhhhhhhhghhhghhhhhghhhghhh
This is the code tried:
import re
with open('last_mock.fastq','r') as rf:
for line in rf:
x= re.match( r"(GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA)",line)
if x:
print(x)
Btw, sequences (from reads, in OP) are same length as adapter and none of them contain adapter (only 3 nts match with 7 sequences).
thanks for your reply @cpad0112 , I know how to do it in cutadapt but my professor strictly told us to write a code in python or perl.
also can you suggest me any python library.
Install biopython and use seqIO and SeqRecord classes