Entering edit mode
7.4 years ago
Ric
▴
440
Hi, How is it possible to remove reads shorter than 10kb from FASTQ files. I found how to do it FASTA files but not for FASTQ files
bioawk -c fastx '{ if(length($seq) > 10000) { print ">"$name; print $seq }}' bwa/unmapped_${output}.fasta > bwa/unmapped_${output}-gt-10000 .fasta
Thank you in advance.
Wow.
I am trying to think of an experiment/analysis that would be strictly benefited by removing reads under 10kbp, and I'm drawing blanks. I don't work with Nanopore often, but I do often work with PacBio, and... well... no, I can't imagine a scenario. There are scenarios in which software for long, low-accuracy reads gives better results when people throw away short reads. But that is always a flaw in the software (in which case, you should complain and demand better software, rather than throwing away data); and by "short", I mean >500bp or so. I think, if you throw away 10kbp reads for any experiment because they are too short, you're doing it wrong.
Hi, from my PacBio reads, I removed contaminations such chloroplast and some of the reads are very shortly afterwards. Do you think in this case it is a good idea to remove reads which are shorter than 500kb or 1000kb?
That depends on what you are doing, but generally no (I assume you mean bp, not kbp).