Hi all,
Since it is a new question for HLA typing, I post this question as new.
After trying several public tools, such as HLAminer and ATHLATES for HLA typing, it seems that the two tools are developed for whole-exome seq for the HLA genes. If we only seq some specific exons, it will not worked.
For example, we use Miseq to ONLY sequence the exon 2 and 3 for HLA-A gene, till now, my idea is to directly match the reads to the exon reference, because our reads length is 300bp while the exon length is ~270bp, in other words, one reads stand for a round of exon.
However, I am confused about the HLA database, for example, for HLA-A gene, it provides 2579 alleles. I can get the whole fasta file for each allele, but can not directly get the exon fasta data for each alleles.
The only similar file is the alignment file, A_nuc.txt, but I did not find any comments for this file. The HLA-A gene supposed to have 8 exons, does the file seperate these exons by "|"?If there is anyone know how to get the exon fasta data for alleles, plz let me know.
HLA Class I
Gene A B C E F G
Alleles 2,579 3,285 2,133 15 22 50
Proteins 1,833 2,459 1,507 6 4 16
Nulls 121 109 63 0 0 2
Thanks
You can try looking at the EMBL format IMGT/HLA data file which include the exon data (see ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla.dat).
For more information about the data provided by IPD (including IMGT/HLA) I suggest you contact the IPD folks via their contact forms:
Thanks, I actually build a script to extract all the exon seq from the alignment file in IMGT
Hi jing, which tool did you use finally for the data analysis ?
This is useful !!! others who wana extract HLA exomes information , I suggest you can use hla.dat to build your database.
Hi, Did you got the answer?