I have lists of contig coordinates for several assemblies and would like to create a bed format mask to exclude variants in the first and last 1Mb of each contig.
Example lines from the contig files:
Chr1 1 123000000
Chr2 1 11435255
AEG1.2 1 2335
I could do something simple using awk like this
awk '{print ($1,$2+1000000,$3 - 1000000)}' contig.bed > filter_ends.bed
This would be a positive mask of regions to keep and I'd prefer a negative mask (though that's not essential). But it would not behave properly for contigs that are < 2000000bp, it would return non existent or negative coordinates.
Effectively I will be excluding those contigs anyway because the filtering from both ends will overlap. I could do this in two steps but as I have many assemblies to run over does anyone know a good approach for this? I suppose for example first one could remove the contigs < 2000000 and then run the awk command.
Thanks in advance for your suggestions.