eliminate empty reads in R1 and R2 fastq file
2
0
Entering edit mode
6.7 years ago
cabraham03 ▴ 30

Hi, I have a genome in two fastq files, an I have tried to determinate the deep coverage using, bwa, samtools, and BAMstats, the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1, that make some errors in bwa. So I want to keep the same reads (same number of reads, same order, and the same names in both files) that are not empty in both files..... any suggestion ??? I have tried to make a perl script, but I just can't find the way to fix it..... Any Software to fix that problem ???

Thanks So Much !!

genome dna sequencing Assembly • 3.1k views
ADD COMMENT
0
Entering edit mode

You may check out skewer. It can discard empty read pairs and also performs quality trimming, which should benefit you if you want to assembly your genome. For standard purposes, I typically use:

./skewer -n -q 25 -Q 25 -m pe -l 25

It runs on paired-end data (-m pe), discarding degenerated (many Ns) reads (-n), trims the 3' until it hits a trailing base of quality 25 or higher (-q 25), discards reads with average quality below 25 (-Q 25), and discards reads and its mates shorter 25bp (-l 25). Multithreading with -t is possible.

ADD REPLY
0
Entering edit mode

the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1

It seems like your R1 and R2 files are not properly paired. Did you perform some pre-processing step before the analysis?

ADD REPLY
1
Entering edit mode
6.7 years ago

repair.sh from BBTools could do this.

repair.sh in1=broken1.fq in2=broken2 out1=fixed1.fq out2=fixed2.fq outs=singletons.fq repair

fin swimmer

ADD COMMENT
0
Entering edit mode
6.7 years ago

using paste + awk:

 paste <(gunzip -c R1_001.fastq.gz | paste - - - - ) <(gunzip -c R2_001.fastq.gz| paste - - - -) |awk -F '\t' '(length($2)>0 && length($6)>0)' |tr "\t" "\n"

output will be an interleaved fastq file that you can (pipe into/use with) bwa mem with option

   -p         first query file consists of interleaved paired-end sequences
ADD COMMENT

Login before adding your answer.

Traffic: 2200 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6