I have a file with the following format, that relates a determined position of the genome to coverage data:
Sequence position coverage
NC_006510 1 19
NC_006510 2 22
NC_006510 3 44
NC_006510 4 1
NC_006510 5 1
NC_006510 6 0
NC_006510 7 0
... (low coverage positions, under 6)
NC_006510 1000 0
NC_006510 1001 66
NC_006510 1002 66
I would like to get intervals larger than 800 pb with low coverage along a sequence. For example, a desirable output would be like that:
NC_006510 4..1000 #In this position interval, the sequence has low coverage (<6).
Do you know a clever way to do it?
Thanks in advance.
What do you mean by 800pb ?