HLA genotyping of whole genome sequencing data
1
0
Entering edit mode
9 months ago
Biomed-jeh ▴ 70

Hello,

I'm encountering some difficulties in initiating my analysis, which involves creating an immune-compatible stem cell line by knocking out selected HLA genes. The primary task is to remap whole genome sequences to identify the HLA sequences.

Initially, I attempted to map the whole genome sequencing data (using paired-end sequencing, approximately 45 GB of data per fastq.gz file) to the human chromosome 6 genome. However, I quickly realized that this approach was not fruitful.

Subsequently, I downloaded all known HLA sequences from the following database: https://www.ebi.ac.uk/ipd/imgt/hla/. However, with approximately 29,000 unique HLA sequences available, it became evident that managing this volume of data would be challenging without a tool for visualization.

Currently, I find myself at an impasse. I experimented with an approach mentioned in a paper (https://www.sciencedirect.com/science/article/pii/S1934590919300475?via%3Dihub), which unfortunately relies on Python 2, and I could not make it work on the HPC that I have access to.

Consequently, I am reaching out for assistance. Does anyone know of a tool supported by Bioconductor that could aid in identifying HLA sequences within my whole genome sequencing data?

Thank you for any comments, recommendations and/or solutions

HLA genotyping WGS • 713 views
ADD COMMENT
1
Entering edit mode

Have you already looked at arcas HLA https://github.com/RabadanLab/arcasHLA?

ADD REPLY
1
Entering edit mode

Here is a benchmarking study for HLA typing tools using different data types.

ADD REPLY
0
Entering edit mode
9 months ago
Biomed-jeh ▴ 70

Hi dthorbur and DBScan

Thank you very much for your replies. I took a look on the benchmarking study and found that arcasHLA linked by DBScan ranks quite well. I have been working on this, and I ran into multiple issues, and I would like to share those, in case someone in the future reads this post.

  1. I usually align sequences to a reference genome using HISAT2 or STAR (depending whether it is bulk or single cell sequencing data), but for whole genome sequencing those tools do not work (or maybe I am not declaring some parameters correctly). I am currently trying Kallisto as I saw it being mentioned in the arcasHLA manual.
  2. Kallisto installation requires a lot of memory (had to request for 64 gb of memory to index the hg38 ref genome) and also requires a processor that can read AVX instructions.
  3. I use anaconda to create environments, make sure you install kallisto version > 0.50, as for some reason kallisto v. 0.44 is downloaded by default... Same goes for python, make sure that you install > 3.6, otherwise numpy will cause issues.

If you have any recommendations for HLA aligner tools, please feel free to share :)

ADD COMMENT

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6