I am using liftover to convert ~100,000 hg19 coordinates to hg38. I know that there are duplicates in the hg19 bed file, but not sure whats going on or whats best to do. The hg38 coordinates are very different. Maybe table browser is a better option? Thank you :).
hg19
chr19 54801916 54802239 chr19:54801916-54802239 . LILRA3;LILRA6
chr19 54801917 54802239 chr19:54801917-54802239 . LILRA3
chr19 54802472 54802789 chr19:54802472-54802789 . LILRA3;LILRA6
chr19 54802473 54802789 chr19:54802473-54802789 . LILRA3
chr19 54803901 54804020 chr19:54803901-54804020 . LILRA3
hg38
chr19_KI270938v1_alt:273030-273353
chr19_KI270938v1_alt:273031-273353
chr19_KI270938v1_alt:273586-273903
chr19_KI270938v1_alt:273587-273903
chr19_KI270938v1_alt:275015-275134
Thank you for the information, may I ask how you were able to determine that LILRA3 is not annotated in hg38 but LILRB2 and LILRA5 are? I guess I am trying to figure out tools that may help. Thank you very much :).
Go look at LILRA3 in hg19, see what its neighbors' names are. Then look those up in hg38, you will find they are still together but LILRA3 has disappeared.
Interesting, so what is best or the correct thing to do in a case like this? I guess to try and figure out why it mis mapped or potentially why may help. Thank you :).
Well what are you doing with the remapped coordinates? Why do you need them?
The reference on our sequencer is hg38 so I am lifting over the hg19 targets to hg38 as well. Basically after the sequence aligns the target bed file is used for variant calling, coverage, etc... Thank you :).
Most accurate is probably to repeat the alignment, if you have access to the original data.
Absolutely, repeat the alignment. And don't reinvent the wheel with gene coordinates; every genome version has its own associated gene annotations somewhere. See the knownGene files in UCSC hg38 annotation database or any Ensembl GTF for human which is Ensembl 76 or later.