I'm looking for a training set from human HLA typing with long reads (454, IonTorrent or MiSeq 300bp) obtained by PCR (amplicon sequencing). I don't mind about MHC loci or if it's genomic or transcriptomic, but I need a dataset that contains:
- NGS reads
- Sequences of used primers
- Sequences of barcodes used to tag samples
- Reference genotypes of the samples to validate predictions (by Sanger sequencing or another well established method)
It's very hard to find any public data from literature. There a lot of papers about the topic, but most of them are from companies (for ex. Roche) and they don't publish the data.
Thanks in advance.
PD: HapMap and 1000 Genomes reads are not valid, they are not from PCR and they are too short ;)