Some has work with LiftOver in big dataset?
I have a big dataset with genes coordinates and I am converting these coordinates from hg19 to hg38. I am using the python package LiftOver.
The problem is that this does not always return the same output structure as they say:
Returns a list of possible conversions for a given chromosome position. The list may be empty (no conversion), have a single element (unique conversion), or several elements (position mapped to several chains). The list contains tuples (target_chromosome, target_position, target_strand, conversion_chain_score), where conversion_chain_score is the "alignment score" field specified at the chain used to perform conversion. If there are several possible conversions, they are sorted by decreasing conversion_chain_score.
So, when I tried to iterate over my list of 19677,
with this
for index, row in loeuf.iterrows():
tmp += lo.convert_coordinate(row['chromosome'], row['start_position'])
I lost a few genes coordinates
I don't really mind if I delete these few genes, but the problem is that I also lost the order and when I add the new coordinates with the original dataset and I check if the coordinates of these genes in the new reference genome are correct. It is fine in the first genes but when I check the last ones are completely different
chromosome_19 start_position_19 chromosome_38 star_position_38
0 chr19 58345178 chr19 58353499
1 chr10 50799409 chr10 50885675
2 chr12 9067664 chr12 9116229
3 chr12 8822472 chr12 8887001
4 chr1 33306766 chr1 33321098
... ... ... ... ...
19666 chr1 52726454 chr7 143391111
19667 chr7 143381080 chr17 4143020
19668 chr17 4004445 chr1 77683419
19669 chr1 77562416 chr19 14075062
Any idea how I can do this?? I have been working around unlucky.
can you please stop asking questions without commenting and/or validating people's answers. ManuelDB
What do you mean, I have validated all previous answers and some of them have been commented on?
EDIT: I see what you mean. I have been upvoting answer but not validating. Sorry