Question

how to write two fastq files into one

0

Entering edit mode

4.9 years ago

susangao007 • 0

Hi everyone,

I was trying to read the left and right sequence reading from two fastq files and zip them into one. All I need it the ID name and sequence, not the + quality score. in the end, I should have ID name and sequence from left read, then ID name and sequence from right read, then ID name and sequence from left read..... I have trouble with the SeqRecord generation.

here is my code:

  1 #!/usr/bin/env python3
  2 # readFastq.py
  3 # Import Seq, SeqRecord, and SeqIO
  4 from Bio import SeqIO
  5 from Bio.SeqRecord import SeqRecord
  6 
  7 from Bio.Seq import Seq
  8 # Import itertools to take a slice of list
  9 import itertools
 10 # The first parameter to SeqIO.parse is the file location
 11 # The second parameter is the file type
 12 leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fast    q")
 13 rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fast    q")
 14 seqIDleft=list()
 15 sequenceleft=list()
 16 seqIDright =list()
 17 sequenceright=list()
 18 # Just get five sequences as an esxample
 19 firstFiveleft = itertools.islice(leftReads, 5)
 20 firstFiveright = itertools.islice(rightReads, 5)
 21 
 22 
 23 for left in firstFiveleft:
 24   seqIDleft.appendleft.id)
 25   sequenceleft.append(str(left.seq))
 26 
 27 
 28 for right in firstFiveright:
 29   seqIDright.appendright.id)
 30   sequenceright.append(str(right.seq))
 31 
 32 A=zip(seqIDleft,seqIDright)
 33 B=zip(sequenceleft,sequenceright)
 34 
 35 C=SeqRecord(A,B)
 36 
 37 
 38 SeqIO.write(C, "Interleaved.fastq", "fastq")

python • 2.4k views

ADD COMMENT • link updated 4.9 years ago by Pierre Lindenbaum 164k • written 4.9 years ago by susangao007 • 0

0

Entering edit mode

An alternate solution that does not use python.

Use BBMap suite. Remove .gz extensions if you want to leave files uncompressed.

reformat.sh in1=file_R1.fq.gz in2=file2_fq.gz out=interleaved.fq.gz

ADD REPLY • link 4.9 years ago by GenoMax 148k

score 0 · Answer 1 · 2020-02-15

Untested code, but the general approach should work:

from Bio import SeqIO
from itertools import chain

for i in chain(*zip(list(SeqIO.parse("/path/to/reads.R1.fastq", "fastq")), list(SeqIO.parse("/path/to/reads.R2.fastq", "fastq")))):
    SeqIO.write(i, "interleaved.fastq", "fastq")

If you want the entries without the quality scores, it might be enough to just switch to using SeqIO.write(..., ..., "fasta") but I haven't tested this.

Otherwise, it would be as simple as adding a line before the write call of the form x = SeqRecord(i.description, i.seq) , then write the x object).

score 0 · Answer 2 · 2020-02-15

0

Entering edit mode

4.9 years ago

ATpoint 86k

Using a one-liner with seqtk to produce an interleaved fasta file:

seqtk mergepe test1.fq test2.fq | seqtk seq -A - > out.fa

ADD COMMENT • link 4.9 years ago by ATpoint 86k

score 0 · Answer 3 · 2020-02-15

0

Entering edit mode

4.9 years ago

Pierre Lindenbaum 164k

paste <(cat file1.fq | paste - - - - | cut -f1,2 ) <(cat file2.fq | paste - - - -| cut -f1,2 ) | tr "\t" "\n"

ADD COMMENT • link 4.9 years ago by Pierre Lindenbaum 164k