Hi everyone,
I was trying to read the left and right sequence reading from two fastq files and zip them into one. All I need it the ID name and sequence, not the + quality score. in the end, I should have ID name and sequence from left read, then ID name and sequence from right read, then ID name and sequence from left read..... I have trouble with the SeqRecord generation.
here is my code:
1 #!/usr/bin/env python3
2 # readFastq.py
3 # Import Seq, SeqRecord, and SeqIO
4 from Bio import SeqIO
5 from Bio.SeqRecord import SeqRecord
6
7 from Bio.Seq import Seq
8 # Import itertools to take a slice of list
9 import itertools
10 # The first parameter to SeqIO.parse is the file location
11 # The second parameter is the file type
12 leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fast q")
13 rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fast q")
14 seqIDleft=list()
15 sequenceleft=list()
16 seqIDright =list()
17 sequenceright=list()
18 # Just get five sequences as an esxample
19 firstFiveleft = itertools.islice(leftReads, 5)
20 firstFiveright = itertools.islice(rightReads, 5)
21
22
23 for left in firstFiveleft:
24 seqIDleft.appendleft.id)
25 sequenceleft.append(str(left.seq))
26
27
28 for right in firstFiveright:
29 seqIDright.appendright.id)
30 sequenceright.append(str(right.seq))
31
32 A=zip(seqIDleft,seqIDright)
33 B=zip(sequenceleft,sequenceright)
34
35 C=SeqRecord(A,B)
36
37
38 SeqIO.write(C, "Interleaved.fastq", "fastq")
An alternate solution that does not use python.
Use BBMap suite. Remove
.gz
extensions if you want to leave files uncompressed.