We have ongoing outsourced deep exom-sequencing of ten samples and our question is when we receive the data should we align to the reference hg18 or hg19? What are the pros. and cons. for using the more stable hg18 compared to the updated hg19?
I know from previous questions that hg18 has more annotation, however does that matter in the alignment process?
if your only concern is the mapping process, I would go for hg19 as it is (should be) a better description of the human genome. even for a lot of the common annotations you shouldn't find any problem being hg19 based. but your concern about annotations in general is founded, as there is plenty of annotation still hg18 based. since these annotations presumably won't change in the future, since they've had plenty of time to do so, the only advice I may have for you is the same one we gave to ourselves when we faced this same issue:
consider the annotations you need, and if they are all hg19 based then there's nothing to think about.
if there are some hg18 based then see if you could live with migrating hg18 coordinates for that particular annotations into hg19, because if so you may still be able to work almost transparently with hg19.
if on any of these considerations you see any possible trouble then you should stick to hg18, knowing that you will have plenty of annotations for your results at the price of loosing some genomic information.
I would say that sticking to hg18 shouldn't be a big issue if you're targeted resequencing very well known regions or genes, but depending on your particular research aim you will definitely have to evaluate it. personally, I would always try use hg19 unless forced not to.
Thanks a lot... Extremely useful answer!! If the probes are prepared from hg18 does that effect your answer?
ADD REPLY
• link
updated 3.6 years ago by
Ram
44k
•
written 13.7 years ago by
Thomas
▴
760
1
Entering edit mode
There you have another thing we had to consider too. but we came out with the idea that human genome is human genome after all, so the reads should map fine on both hg18 and hg19, but mapping to hg19 would be more realistic assuming assembly improvement. the problem would be that the "on target" measurements won't fit appropriately, so the best you can do if your probes are hg18 based and you map resulting reads to hg19 is to convert your probes' start-end coordinates to hg19, and then you'll be able to trust qualities and coverages on target.
ADD REPLY
• link
updated 3.6 years ago by
Ram
44k
•
written 13.7 years ago by
lh3
33k
0
Entering edit mode
right lh3, forgot to cite how to convert coordinates. just to point out an alternative to liftOver, as already mentioned in other answers around here, there's the Ensembl's Assembly Converter which may work as a webservice (although for large datasets it'll be more efficient to use the script version like liftOver).
I would argue in favor of hg19. I agree that the project and the annotations that must be used determines the decision but in that case there is no question as to what build to use for alignment. If you really don't have a "must use" annotation go for hg19. There are a lot of annotations that are only available for hg19 as well. Just take a look at UCSC genome browser for hg18 and hg19 and see the difference ( dbsnp 131-132, neswest Encode tracks) . Also aligning to hg19 should help you with reducing the number of false positives and false negatives in terms of variant detection. The most important thing is variant detection. The availability of annotation is really not that important if you did not find the variant you were looking for or if you found a variant that wasn't there in the first place.
Agreed. More recent annotations will be available to hg19 exclusively.
ADD REPLY
• link
updated 3.6 years ago by
Ram
44k
•
written 13.6 years ago by
lh3
33k
0
Entering edit mode
I just wanted to state that using hg18 isn't that incorrect, as it's just a matter of a decision one has to make almost entirely depending on certain particular annotations of interest that could be only available for hg18, which are not that rare. I thought the question wanted to see pros and cons of the decision, so my idea was to raise them on my answer. my personal preference is hg19 too (we do in fact use hg19 as our mapping reference for our exome resequencing experiments), since all annotations we need are already mapped to it, so maybe I didn't make myself clear before.
Thanks a lot... Extremely useful answer!! If the probes are prepared from hg18 does that effect your answer?
There you have another thing we had to consider too. but we came out with the idea that human genome is human genome after all, so the reads should map fine on both hg18 and hg19, but mapping to hg19 would be more realistic assuming assembly improvement. the problem would be that the "on target" measurements won't fit appropriately, so the best you can do if your probes are hg18 based and you map resulting reads to hg19 is to convert your probes' start-end coordinates to hg19, and then you'll be able to trust qualities and coverages on target.
Also, UCSC liftOver is your friend.
right lh3, forgot to cite how to convert coordinates. just to point out an alternative to liftOver, as already mentioned in other answers around here, there's the Ensembl's Assembly Converter which may work as a webservice (although for large datasets it'll be more efficient to use the script version like liftOver).
UCSC liftOver is your friend.