Description of alt loci in ref genome GRCh38.d1.vd1
2
0
Entering edit mode
6.2 years ago

Hi all

Can anyone point me to a description of GRCh38.d1.vd1 (used by the TCGA GDC) and the alternate loci in particular? Is it an adaptation of one of the 'GRCh38.p\d{1,2}' genomes? You'd think NCBI knows about all ref genomes, but this one is not included. The information provided by the TCGA network is rather terse.

The reason I'm asking is that I'd like to specifically download reads mapping to a highly polymorphic genomic region, HLA. Some of the 'KI\d{6}' might contain reads I'm interested in?

Thanks very much, Maarten

reference genome alternate loci TCGA • 2.9k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

Through GenBank Accession GCA_000786075, I found the following report that lists all decoys as unplaced scaffolds:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_assembly_report.txt

ADD COMMENT
0
Entering edit mode
5.3 years ago
m_two • 0

https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

Reference files used by the GDC data harmonization and generation pipelines are provided below. MD5 checksums are provided for verifying file integrity after download. Additional files are also included to allow for reproduction of GDC pipeline analyses.

GRCh38.d1.vd1 Reference Sequence GRCh38.d1.vd1.fa.tar.gz

md5: 3ffbcfe2d05d43206f57f81ebb251dc9 This reference genome is used by the GDC for all sequencing and array based analyses. This file is composed of the following sequences:

GCA_000001405.15_GRCh38_no_alt_analysis_set Sequence Decoys (GenBank Accession GCA_000786075) Virus Sequences

ADD COMMENT

Login before adding your answer.

Traffic: 1996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6