I looked at the Y chromosome in hg19 and 1000g and they seem to differ despite having the same # of characters. Has anybody noticed this ? Why do they differ ?
I looked at the Y chromosome in hg19 and 1000g and they seem to differ despite having the same # of characters. Has anybody noticed this ? Why do they differ ?
1000G uses sequences from Ensembl (see README at location in your FTP link).
It seems that Ensembl has a slightly different procedure for inserting N into the sequence scaffolds. The issue is discussed in this mailing list thread.
Edit my own comments. I see. I made the build36 version of the genome for 1000g. At that time, there was this difference. My colleague later told me that UCSC have changed to the Ensembl way since hg19, but UCSC still keeps the pseudoautosomal regions on chrY. This is a wrong decision. I would discourage to use the UCSC genome for the mapping purpose.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You should link to the source of the data in each case so we can look at it. However: HG19 is a consensus sequence, 1000G is the sequences from many individuals. So it's not surprising that they differ since the goal of 1000G is indeed to understand variation. There is in fact no single "Y chromosome in 1000g."
1000g: ftp://ftp.sanger.ac.uk/pub/1000genomes/tk2/main_project_reference/ UCSC: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/
OK, now I see that you are referring to the reference sequences used by the 1000G project.
You should at least point out one base-pair difference to support your argument. So far as I know, they are the same. EDIT: I was wrong. They are different. We should use the 1000g genome if possible.