Estimate sizes of repeats in a especific Gene
0
0
Entering edit mode
18 months ago
Rafael ▴ 10

Amateur problem here:

We know that it is possible to use the ExpansionHunter tool to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat. Here is the reference link: https://github.com/Illumina/ExpansionHunter

Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Fragile X Syndrome, ALS, and Huntington's Disease are well known examples.

But I have encountered some difficulties dealing with ExpansionHunter.

I ask: is it possible to search for repetitions using other software, such as IGV (Integrative Genomics Viewer) for example or any other equivalent software?

CGG Fragile-X • 1.2k views
ADD COMMENT
1
Entering edit mode

Tell me if I'm in the way. I have the CRAM file and the respective CRAI (index).

So I just ran the SAM like this, clipping my area of interest:

$ samtools view -b NG1PSZ7BE9.mm2.sortdup.bqsr.cram "chrX:147912050-147912110" > result.bam

Then I indexed the .bam file:

$ samtools index result.bam

My reference file is this:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrX.fa.gz

My variants file looks like this:

[    
{
      "LocusId": "FMR1",
      "LocusStructure": "(CGG)*",
      "ReferenceRegion": "chrX:147912050-147912110",
      "VariantType": "Repeat"    
} 
]

When trying to run ExpansionHunter...:

$ ./Hunter/bin/ExpansionHunter --reads result.bam --reference chrX.fa --variant-catalog variant_catalog.json --output-prefix output

I get the following error:

2023-05-17T17:23:01,[Starting ExpansionHunter v5.0.0] 2023-05-17T17:23:01,[Analyzing sample result] 2023-05-17T17:23:01,[Initializing reference chrX.fa] 2023-05-17T17:23:01,[Loading variant catalog from disk variant_catalog.json] 2023-05-17T17:23:01,[Running sample analysis in seeking mode] 2023-05-17T17:23:01,[Analyzing FMR1] 2023-05-17T17:23:01,[Could not recover the mate of NG1PSZ7BE9_2023_04_13_192298_ProPhase_BP_FP200009316L1C044R06203103192/1] 2023-05-17T17:23:01,[Writing output to disk]

ADD REPLY
0
Entering edit mode

you didn't read my comment above Estimate sizes of repeats in a especific Gene . At this point I'm leaving the thread. Sorry.

ADD REPLY
0
Entering edit mode

But I have encountered some difficulties dealing with ExpansionHunter.

If you could elaborate on that...

ADD REPLY
0
Entering edit mode

I'm going to start from the beginning to make sure I don't miss any steps.

I already have ExpansionHunter properly installed and running.

I have the patient's .CRAM file with his 30x WGS that is approximately 60 gigabytes in size.

I think that first I need to cut the area of interest referring to the desired expansion in FMR1, correct?

First of all, probably using samtools, I need to clip the specific region of FMR1 where Fragile-X repeats occur: chrX:147912050-147912110

Something like that?

samtools view input.cram "chrX:147912050-147912110" > output.cram
ADD REPLY
1
Entering edit mode

the command above produces a SAM file without header and is missing a reference genome to decode the CRAM.

you want samtools view -T /path/to/ref.fa -O CRAM -o output.cram input.cram "chrX:147912050-147912110"

ADD REPLY
1
Entering edit mode

and if you want to use expansion hunter in a defined region, just create a variant-catalog containing the region and use expansion with --analysis-mode seeking https://github.com/Illumina/ExpansionHunter/blob/master/docs/03_Usage.md

ADD REPLY
0
Entering edit mode

Cool, I am going to try.

Doubt: Is it mandatory to extract the desired region using samtools or can I run ExpansionHunter directly with the original .CRAM file?

ADD REPLY
0
Entering edit mode

I already answered in my comment above.

ADD REPLY
0
Entering edit mode

Perfect, do you suggest any reference FASTA files for the human genome that you tend to use frequently?

ADD REPLY

Login before adding your answer.

Traffic: 1595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6