We sequenced human MHC or HLA regions in chromosome 6 for a few hundred human samples. MHC regions show extremely high sequence diversity, there are more than 10 SNPs in one 100 bp read, so the regular sequence alignment tools such as BWA could not handle it due to too many mismatches in one read. However, MHC regions are well sequenced by traditional Sanger sequencing technology, so majority of SNPs or deletions or insertions are identified. Therefore, if a alignment tool can use degenerated nucleotide code [ instead of A T C G nucleotide only, R(A or G) Y(C or T) K (G or T) M (A or C) N (A T C G) codes are included in reference sequence in addition to A T C G], then MHC regions can be aligned well BY allowing additional two mismatches. Do you know an alignment tool that can adopt degenerated nucleotide codes as reference sequence?
thank you, Ding