Hi, I have a genome in two fastq files, an I have tried to determinate the deep coverage using, bwa, samtools, and BAMstats, the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1, that make some errors in bwa. So I want to keep the same reads (same number of reads, same order, and the same names in both files) that are not empty in both files..... any suggestion ??? I have tried to make a perl script, but I just can't find the way to fix it..... Any Software to fix that problem ???
Thanks So Much !!
You may check out skewer. It can discard empty read pairs and also performs quality trimming, which should benefit you if you want to assembly your genome. For standard purposes, I typically use:
It runs on paired-end data (-m pe), discarding degenerated (many Ns) reads (-n), trims the 3' until it hits a trailing base of quality 25 or higher (-q 25), discards reads with average quality below 25 (-Q 25), and discards reads and its mates shorter 25bp (-l 25). Multithreading with -t is possible.
It seems like your R1 and R2 files are not properly paired. Did you perform some pre-processing step before the analysis?