I have a BED file with the position of retrotransposons in the mouse genome and I would like to find the nearest gene, the distance to that gene and whether it is on the + or - strand.
There are so many different file formats for the mouse genome and many different databases to choose from, I was wondering what the best tool and what the best database to use would be.
Because many retrotransposon promoters are strong drivers of transcription in both directions, I would suggest collecting both the + and - strand nearest genes.
It is admittedly my own tool, but the closest operation in bedtools will do what you want. The -d option will report the distance between the retrotransposon and the nearest gene. If they in fact overlap one another, the distance will be 0. My answer assumes that the genes.bed file includes the gene's strand. If it does, the strand will be reported in the output. Note that GFF is fine as well.
bedtools closest -a retro-inserts.bed -b genes.bed -d
Also, I just remembered that galaxy has a nice option in their "Operate on Genomic Intervals" section called "Fetch closest non-overlapping feature for every interval". This is an equally good option, though it looks like it doesn't report the distance between intervals. That said, once you have the coordinates, a little awk and the formula I mention in this thread is all you need to get the distance.
I did exactly what aaronQuinlan suggested above with "closestBed".
I obtained the relevant files I needed from the UCSC genome table browser
selecting Group Gene and Gene Prediction Tracks, track Ensembl gene and output format BED.
I also downloaded the ensemblToGeneName table and used a small perl script to convert the ensembl transcript name to gene name and only keep the columns I wanted.
Now I just need to figure out how to get both the + and - strand nearest genes as Larry_Parnell suggested.
"Now I just need to figure out how to get both the + and - strand nearest genes as Larry_Parnell suggested." You could it twice: once with -s and once with -S.
Because many retrotransposon promoters are strong drivers of transcription in both directions, I would suggest collecting both the + and - strand nearest genes.