Submiters of samples in 1000 genomes project
1
0
Entering edit mode
6.9 years ago
ognjen011 ▴ 290

Hello!

I am trying to discover which capture kit was used for some BAM files I obtain from the public 1000 genomes repositories (e.g. HG00096). Currently, I am going through their volume mounted on S3, so I cannot give an easy link, but I am going someone knows their way around this data.

I have found that there are several submitters, and each submitter has their own capture kit in the paper supplementary: https://media.nature.com/original/nature-assets/nature/journal/v526/n7571/extref/nature15393-s1.pdf

How would one discover which samples were submitted by which center/contributor? IS there an easier way to find out which capture kit was used?

Thanks in advance!

EDIT: The data is given at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/.

1000 genomes metadata • 2.0k views
ADD COMMENT
3
Entering edit mode
6.9 years ago

you can find the sequencing center in the read group of the bam files:

$ curl -s  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree" |grep bam | cut -f 1 | grep 'bam$' | grep HG00096 | sed 's%^%ftp://ftp.1000genomes.ebi.ac.uk/vol1/%' | while read U; do echo ${U} && curl -s "${U}" | samtools view -H | grep '^@RG' | tr "\t" "\n" | grep '^CN' | cut -d ':' -f 2 | sort | uniq ; done

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/high_coverage_alignment/HG00096.wgs.ILLUMINA.bwa.GBR.high_cov_pcr_free.20140203.bam
BI
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/exome_alignment/HG00096.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/exome_alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/exome_alignment/HG00096.unmapped.ILLUMINA.bwa.GBR.exome.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/exome_alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.exome.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.unmapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/technical/ncbi_varpipe_data/alignment/HG00096/HG00096.ILLUMINA.mosaik.GBR.low_coverage.20101123.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/technical/ncbi_varpipe_data/alignment/HG00096/HG00096.chrom20.ILLUMINA.mosaik.GBR.low_coverage.20101123.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/technical/other_exome_alignments/HG00096/exome_alignment/HG00096.mapped.ILLUMINA.BWA.GBR.exome.20110411.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/exome_alignment/HG00096.mapped.illumina.mosaik.GBR.exome.20110411.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.unmapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
WUGSC
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/ncbi_varpipe_data/alignment/HG00096/HG00096.mapped.illumina.mosaik.GBR.low_coverage.20111114.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/ncbi_varpipe_data/alignment/HG00096/HG00096.chrom11.illumina.mosaik.GBR.low_coverage.20111114.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/ncbi_varpipe_data/alignment/HG00096/HG00096.chrom20.illumina.mosaik.GBR.low_coverage.20111114.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/other_exome_alignments/HG00096/exome_alignment/HG00096.mapped.illumina.mosaik.GBR.exome.20111114.bam
wugsc
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140203_broad_high_cov_pcr_free_validation/matching_LC_samples_bwamem/HG00096.bwa_mem.20130502.low_coverage.20140501.bam
WUGSC
ADD COMMENT
0
Entering edit mode

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6