Amateur problem here:
We know that it is possible to use the ExpansionHunter tool to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat. Here is the reference link: https://github.com/Illumina/ExpansionHunter
Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Fragile X Syndrome, ALS, and Huntington's Disease are well known examples.
But I have encountered some difficulties dealing with ExpansionHunter.
I ask: is it possible to search for repetitions using other software, such as IGV (Integrative Genomics Viewer) for example or any other equivalent software?
Tell me if I'm in the way. I have the CRAM file and the respective CRAI (index).
So I just ran the SAM like this, clipping my area of interest:
Then I indexed the .bam file:
My reference file is this:
My variants file looks like this:
When trying to run ExpansionHunter...:
I get the following error:
you didn't read my comment above Estimate sizes of repeats in a especific Gene . At this point I'm leaving the thread. Sorry.
If you could elaborate on that...
I'm going to start from the beginning to make sure I don't miss any steps.
I already have ExpansionHunter properly installed and running.
I have the patient's .CRAM file with his 30x WGS that is approximately 60 gigabytes in size.
I think that first I need to cut the area of interest referring to the desired expansion in FMR1, correct?
First of all, probably using samtools, I need to clip the specific region of FMR1 where Fragile-X repeats occur: chrX:147912050-147912110
Something like that?
the command above produces a SAM file without header and is missing a reference genome to decode the CRAM.
you want
samtools view -T /path/to/ref.fa -O CRAM -o output.cram input.cram "chrX:147912050-147912110"
and if you want to use expansion hunter in a defined region, just create a variant-catalog containing the region and use expansion with
--analysis-mode seeking
https://github.com/Illumina/ExpansionHunter/blob/master/docs/03_Usage.mdCool, I am going to try.
Doubt: Is it mandatory to extract the desired region using samtools or can I run ExpansionHunter directly with the original .CRAM file?
I already answered in my comment above.
Perfect, do you suggest any reference FASTA files for the human genome that you tend to use frequently?