Hello,
I'm encountering some difficulties in initiating my analysis, which involves creating an immune-compatible stem cell line by knocking out selected HLA genes. The primary task is to remap whole genome sequences to identify the HLA sequences.
Initially, I attempted to map the whole genome sequencing data (using paired-end sequencing, approximately 45 GB of data per fastq.gz file) to the human chromosome 6 genome. However, I quickly realized that this approach was not fruitful.
Subsequently, I downloaded all known HLA sequences from the following database: https://www.ebi.ac.uk/ipd/imgt/hla/. However, with approximately 29,000 unique HLA sequences available, it became evident that managing this volume of data would be challenging without a tool for visualization.
Currently, I find myself at an impasse. I experimented with an approach mentioned in a paper (https://www.sciencedirect.com/science/article/pii/S1934590919300475?via%3Dihub), which unfortunately relies on Python 2, and I could not make it work on the HPC that I have access to.
Consequently, I am reaching out for assistance. Does anyone know of a tool supported by Bioconductor that could aid in identifying HLA sequences within my whole genome sequencing data?
Thank you for any comments, recommendations and/or solutions
Have you already looked at arcas HLA https://github.com/RabadanLab/arcasHLA?
Here is a benchmarking study for HLA typing tools using different data types.