Hello every one, this is a question related to my previous question that I posted. I have aligned a bunch of whole genome shot reads against Pangenome proteins by using DIAMOND protein aligner. After filtering for e-value, identity percentage and bit score, I still get some reads which perfectly mapped to multiple genes, below is one the examples:
HWI-ST913:300:C5W5DACXX:7:1101:1649:2180 Ha12_00033467 100.0 33 0 0 100 2 91 123 7.8e-12 65.9
HWI-ST913:300:C5W5DACXX:7:1101:1649:2180 Ha10_00000535 100.0 33 0 0 100 2 116 148 7.8e-12 65.9
HWI-ST913:300:C5W5DACXX:7:1101:1649:2180 Ha7_00045828 100.0 33 0 0 100 2 557 589 7.8e-12 65.9
HWI-ST913:300:C5W5DACXX:7:1101:1649:2180 Ha17_00008591 100.0 33 0 0 100 2 118 150 7.8e-12 65.9
HWI-ST913:300:C5W5DACXX:7:1101:1649:2180 Ha15_00038715 100.0 33 0 0 100 2 173 205 7.8e-12 65.9
The purpose of my study is to find which genes from the Pangenome data is present in my samples and which gene are absent. I just want to know is there any way that I could identify a read originated from which of these genes? Thanks so much for any suggestion and help