Entering edit mode
7.7 years ago
spaul8505
▴
20
Hi,
How is it possible to find the genes that are located within a 2 kb region flanking the RIP loci for a species. I have the RIP results from RIPCAL analysis.
My data looks like this
genes.bed
scaffold100|size105690 3181 5110 genemark-scaffold100|size105690-processed-gene-0.34 . + maker gene
scaffold100|size105690 5243 11348 maker-scaffold100|size105690-augustus-gene-0.109 . + maker gene
scaffold100|size105690 11606 12232 maker-scaffold100|size105690-augustus-gene-0.110 . + maker gene
scaffold100|size105690 12428 13688 augustus_masked-scaffold100|size105690-processed-gene-0.18 . - maker gene
RIP loci.bed
scaffold1000|size3289 0 3276 . 1.76688149833334 + RIP region . id="RIP_1077;RIP_length=3276;RIP_max=2.32797202797203;
scaffold1001|size3283 0 3276 . 1.91911902719 + RIP region . id="RIP_2272;RIP_length=3276;RIP_max=2.8247619047619;
scaffold1002|size3281 0 3276 . 1.87247756983344 + RIP region . id="RIP_310;RIP_length=3276;RIP_max=2.28728728728729;
scaffold1003|size5038 0 5051 . 2.01247550534218 + RIP region . id="RIP_1794;RIP_length=5051;RIP_max=3.13636363636364;
Thanks
Hello Alex,
I have edited my question to include a subset of the data for an example, I tried using bedmap to get the flanking regions within 2Kb of the RIP loci (see Rip loci.bed) but it did not work, I just get the entire list of RIP loci instead of the result.
You need to have chromosome names that match, by fixing your gene inputs:
And then again for the loci:
Then you can run
bedmap
:How you fix up your scaffold name scheme is up to you, but the idea is that the two inputs need to have matching chromosome names, in order for any mapping of overlapping elements to work.
Note that I am replacing the delimiter between reference element and any mapped elements with
%%
, because your mapped elements contain in them the same delimiter (|
in scaffold names) thatbedmap
uses as its default.Thanks. I was not aware that there should be matching names.
Yes, these are BED files and so the first column should be a unique identifier for a chromosome, scaffold, or other collection of intervals you want to test for set overlap. If you have two or more groups, you would name them uniquely, as needed.