I am analyzing the full-length sequencing of TCRs or precisely VDJ regions. Although I worked in the NGS field for some time," immunoinformatics " is new to me. How to do the typing for VDJ. It is not in the genome as one region. For now, I found the database IMGT but in their fasta there are dots that I can't explain. I tried Blast to the genome but they are not some gaps. In the genome it is all connected so should I ignore them? Maybe they separate regions?
>AE000658|TRAV1-1*01|Homo sapiens|F|V-REGION|128090..128364|275 nt|1| | | | |275+48=323| | |
ggacaaagccttgagcag...ccctctgaagtgacagctgtggaaggagccattgtccag
ataaactgcacgtaccagacatctggg..................ttttatgggctgtcc
tggtaccagcaacatgatggcggagcacccacatttctttcttacaatgctctg......
......gatggtttggaggagaca...............ggtcgtttttcttcattcctt
agtcgctctgatagttatggttacctccttctacaggagctccagatgaaagactctgcc
tcttacttctgcgctgtgagaga
>X04939|TRAV1-1*02|Homo sapiens|(F)|V-REGION|52..320|269 nt|1| | | | |269+48=317| | |
ggacaaagccttgagcag...ccctctgaagtgacagctgtggaaggagccattgtccag
ataaactgcacgtaccagacatctggg..................ttttatgggctgtcc
tggtaccagcaacatgatggcggagcacccacatttctttcttacaatggtctg......
......gatggtttggaggagaca...............ggtcgtttttcttcattcctt
agtcgctctgatagttatggttacctccttctacaggagctccagatgaaagactctgcc
Is my approach correct? map to these reference fasta files and assign the cell according to the DJV type like TRAV1-1*02? and what is the meaning of these dots?
one option is TRUST4: https://github.com/liulab-dfci/TRUST4