Question

Training dataset for NGS HLA typing (reads >200bp from PCR amplicons)

1

Entering edit mode

10.5 years ago

Alvaro Sebastian ▴ 70

I'm looking for a training set from human HLA typing with long reads (454, IonTorrent or MiSeq 300bp) obtained by PCR (amplicon sequencing). I don't mind about MHC loci or if it's genomic or transcriptomic, but I need a dataset that contains:

- NGS reads
- Sequences of used primers
- Sequences of barcodes used to tag samples
- Reference genotypes of the samples to validate predictions (by Sanger sequencing or another well established method)

It's very hard to find any public data from literature. There a lot of papers about the topic, but most of them are from companies (for ex. Roche) and they don't publish the data.

Thanks in advance.

PD: HapMap and 1000 Genomes reads are not valid, they are not from PCR and they are too short ;)

NGS HLA Typing Amplicon • 3.1k views

ADD COMMENT • link updated 7.3 years ago by bounlu ▴ 270 • written 10.5 years ago by Alvaro Sebastian ▴ 70

score 1 · Answer 1 · 2018-03-08

1

Entering edit mode

7.3 years ago

bounlu ▴ 270

The references to HLA typing tools might help as they usually train their software on public datasets:

https://www.nature.com/articles/jhg2015102/tables/2

ADD COMMENT • link 7.3 years ago by bounlu ▴ 270