Question

News:1000 Genomes data on GRCh38

10

Entering edit mode

9.2 years ago

fairley ▴ 100

The low coverage 1000 Genomes sequence data has been realigned to GRCh38. Reads were aligned to the full assembly, including the GRC maintained alternate loci sequences and decoy, and additional HLA sequences from the IMGT, the fasta file can be found in the reference directory. The alignment was carried out using a new alt-aware version of BWA-mem

The alignment files themselves can be found in the data_collections/1000_genomes_project/data directory.

The alignment index and sequence.index can be found in the data_collections/1000_genomes_project directory.

Please note, these files are now being distributed in CRAM format, rather than BAM format. You can find more details about CRAM in this README.

Full details of the alignment pipeline can be found in the alignment pipeline README.

In addition, calls that have been mapped to GRCh38 can be found with the release data. Please note, however, that this data is not yet complete, with data for the X and Y chromosomes pending.

If you have any questions please email info@1000genomes.org.

1000Genomes GRCh38 alts alignment • 3.5k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.2 years ago by fairley ▴ 100