Question

eliminate empty reads in R1 and R2 fastq file

0

Entering edit mode

7.3 years ago

cabraham03 ▴ 30

Hi, I have a genome in two fastq files, an I have tried to determinate the deep coverage using, bwa, samtools, and BAMstats, the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1, that make some errors in bwa. So I want to keep the same reads (same number of reads, same order, and the same names in both files) that are not empty in both files..... any suggestion ??? I have tried to make a perl script, but I just can't find the way to fix it..... Any Software to fix that problem ???

Thanks So Much !!

genome dna sequencing Assembly • 3.3k views

ADD COMMENT • link updated 7.3 years ago by Pierre Lindenbaum 166k • written 7.3 years ago by cabraham03 ▴ 30

0

Entering edit mode

You may check out skewer. It can discard empty read pairs and also performs quality trimming, which should benefit you if you want to assembly your genome. For standard purposes, I typically use:

./skewer -n -q 25 -Q 25 -m pe -l 25

It runs on paired-end data (-m pe), discarding degenerated (many Ns) reads (-n), trims the 3' until it hits a trailing base of quality 25 or higher (-q 25), discards reads with average quality below 25 (-Q 25), and discards reads and its mates shorter 25bp (-l 25). Multithreading with -t is possible.

ADD REPLY • link 7.3 years ago by ATpoint 88k

0

Entering edit mode

the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1

It seems like your R1 and R2 files are not properly paired. Did you perform some pre-processing step before the analysis?

ADD REPLY • link 7.3 years ago by h.mon 35k

score 1 · Answer 1 · 2018-03-24

1

Entering edit mode

7.3 years ago

finswimmer 16k

repair.sh from BBTools could do this.

repair.sh in1=broken1.fq in2=broken2 out1=fixed1.fq out2=fixed2.fq outs=singletons.fq repair

fin swimmer

ADD COMMENT • link 7.3 years ago by finswimmer 16k

score 0 · Answer 2 · 2018-03-24

using paste + awk:

 paste <(gunzip -c R1_001.fastq.gz | paste - - - - ) <(gunzip -c R2_001.fastq.gz| paste - - - -) |awk -F '\t' '(length($2)>0 && length($6)>0)' |tr "\t" "\n"

output will be an interleaved fastq file that you can (pipe into/use with) bwa mem with option

   -p         first query file consists of interleaved paired-end sequences