Entering edit mode
4.7 years ago
serpalma.v
▴
80
Hello
I want to determine the begining and end of each interval formed only by Ns (masked region). Then I would like to split the chromosome into smaller intervals to keep the regions that are not masked.
For example:
AGGTCGTTNNNNAACTGNNAGTC
I would like to get three intervals from this sequence: AGGTCGTT -> from 1 to 8 AACTG -> from 13 to 17 AGTC -> from 20 to 23
Is there a tool I could use to do the task? I have been searching, but I cannot find the right one.
Thanks!
duplicate: Split sequence according the 'N' base
It does not give the intervals of the regions that are not masked. That is what I really need.
The "Genome" subtract the "masked" intervals are the "unmasked" intervals, this is what you want, right?
You can get it using
bedtools complement
command:First, you need generate the
Genome
file for your genomeSecond, get the location of masked intervals, (answered by "shenwei356" link from previous post, see above link)
Last. get the unmasked intervals