I have a list of 10 million elements ( each element is 2 base long) in a bed format. I also have a bed files with all the coding genes (each row been the start and ending points of the genes) I want to check how many elements fall into coding protein genes, an also see how many coding genes have been multiple times hit. For doing so, I wrote a script where using "bedtools: coverage" I get the number of elements in each region.
This works find but the problem is that I have many lists of elements, and bedtools takes forever....
I had the idea of using an alinger for doing so (as they are really fast at mapping). From my experience with STAR I know that is a mapping software but, is it anyway so I could build a reference genome just with coordinates (not with the actual sequence) of the coding genes, and then try to map these bed files.
Longs story short: Can I use an aligner such as STAR to align regions instead of sequences?