I have two dicts; one with start coordinates and the other with end coordinates, paired using an incremental integer value e.g. starts[0] = 1000 and ends[0] = 2000. I also have a third dict that holds the corresponding id e.g. ids[0] = "EXON1".
I want to get the number of unique values from both the dicts (i.e. all those that appear at least once as 1000 or 2000 may be duplicated) and have used the following to do so (benchmarks I have seen have shown it to be the fastest method):
unique_starts = {}.fromkeys(exon_seq_region_starts.values()).keys()
unique_ends = {}.fromkeys(exon_seq_region_ends.values()).keys()
I then need to iterate through the smallest of these lists and get all the values that match in the larger list for each of the unique values (e.g. ends at 2000 may have starts at 900, 950 and 1000). I need to retrieve the smallest value (e.g. 900 in this example) and its key in order to associate with the third, exon id dict.
Any ideas on the best way?
@Gawbul Then why don't you create an ordered (start/end) array of object Exon(start,end) and scan from 5' to 3' each overlaping Exons ??
Any chance you could tell us what you are trying to achieve, instead of asking how to implement the method you think is best?
Hi Casbon, I have a list of exon start and end coordinates retrieved from ensembl, but they aren't unique, in that some sequences overlap. I need to retrieve the exon ids and coordinates for the unique exons.