Where to download raw HGDP data used for population genetics studies
2
0
Entering edit mode
8.5 years ago

Hello,

I am interested in downloading following data:

Homo sapiens    Human   Dai HGDP01307   HGDP01307
Homo sapiens    Human   French  HGDP00521   HGDP00521
Homo sapiens    Human   Han HGDP00778   HGDP00778
Homo sapiens    Human   Karitiana   HGDP00998   HGDP00998
Homo sapiens    Human   Mandenka    HGDP01284   HGDP01284
Homo sapiens    Human   Mbuti   HGDP00456   HGDP00456
Homo sapiens    Human   Papuan  HGDP00542   HGDP00542
Homo sapiens    Human   San HGDP01029   HGDP01029
Homo sapiens    Human   Sardinian   HGDP00665   HGDP00665

These are originally mentioned in Prado-Martinez paper (Great ape genetic diversity and population history) and subsequently in many others. For some reason, I cannot find source of raw data (fastq) for all these and would appreciate help. These are supposed to be 100 bp reads.

fastq • 3.7k views
ADD COMMENT
1
Entering edit mode
8.5 years ago
lh3 33k

Look at here. Note that these data were initially produced for the Denisova paper. The great ape paper just reused the data.

EDIT: hmm... I could not find a direct download link. It was there on the page. It seems that now you have to use gridFTP, which is unfortunate. EDIT2: well, you even can't download BAMs...

ADD COMMENT
0
Entering edit mode

Thanks for the link to the original data source. I wrote them email to hgvp_request@simonsfoundation.org, hopefully they will get back to me.

ADD REPLY
0
Entering edit mode

The email just returned to me as unsuccessful.

ADD REPLY
0
Entering edit mode
8.5 years ago
Denise CS ★ 5.2k

This is what I found for HGDP01307 and HGDP00521, for example from ENA. You should be able to retrieve the fastaq from those samples and the remaining listed in your post. ENA content can be easily accessed programmatically.

ADD COMMENT
0
Entering edit mode

Thanks. I downloaded fastq files for HGDP01307 from your first link and those fastq files seem to be tiny. Not sure if it's exactly the same dataset.

ADD REPLY
0
Entering edit mode

I had a look at the decompressed files and they looked sensible to me but if you are concerned (and while you wait to hear from Simons Foundation), it may be worth contacting the ENA support team. If the files are corrupted they will be able to fix the problem and provide the corrected version. I can see the BAM files and their indexes are available from the Max Planck page but not FASTAQ.

ADD REPLY

Login before adding your answer.

Traffic: 1932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6