Hi there,
I have a FASTA genome (GRCh38) for which I want to detect and output a BED file containing intervals of the N sequences in the same. It appears Picard
has a functionality to do so — scatter intervals by Ns; however, I'm unsure whether this is actually doing what I need.
In practice, the command below results in a one-based file, as opposed to the standard zero-based format of BEDs...; therefore, if someone has more experience, I would like to know whether and how I can use this output file with bedtools
to selectively subtract these regions/intervals from the BED coordinate for the entire genome.
Thanks in advance!
java -jar picard.jar ScatterIntervalsByNs \
R=hg38.fna \
OT=N \
O=hg38_one.intervals
@Pierre Lindenbaum, I see. Essentially, what it does is removing the header and subtracting 1 from the first column? Just to make sure because my more straightforward approach would have been to use
grep -v
andawk
trying to accomplish the same. Let me know, thanks!yep :-)