Hi, I am doing now some analyses that require knowledge of the distance from some variant (and invatiant) positions in VCF files to the closest genomic feature reported on a separate GFF file.
I am currently doing so by processing these two files as spreadsheets and calculating these distances with some R code, but as you can imagine this is probably suboptimal, messy, and prone to errors.
I am assuming that there is probably a package on R Bioconductor or Python that allows to generate a dataset of genomic positions taken from a VCF and distances to different genomic features in GFF files, but so far I haven't found anything.
Do you have any recommendation on how to handle this?
Check out the
bedtools
collection of tools and its subcommandbedtools closest
.https://github.com/arq5x/bedtools2
Perhaps I am being unlucky, but this failed for me at some
<NON_REF>
ALT positons. After replacing them withsed -i -e 's/<NON_REF>/./'
, bedtools returns a core dumped error.Without code and data examples it is impossible to debug.