Entering edit mode
3.0 years ago
gubrins
▴
350
Heys,
Once again I need your programming help. I have a lot of fasta files made out of 1Mb sliding window along a reference genome. As there areas in the genome that are not really well sequenced or that the sample has not a lot of data, I would like to remove the files where at least one sample has half of the information as N. How could I do that?
Thanks a lot in advance!
i don't understand.
sorry Pierre. Each one of my fasta files has 1Mb of information. I would like to know if any sample within each fasta file has 50% or more bases as N.
Counting N'S Within Fasta
You can use
stats.sh
program from BBMap suite to generate the base distribution (only relevant part is posted here). You can easily see files where N content would be > 50%.