Haplotype Caller GATK Query
0
0
Entering edit mode
2.8 years ago
ramshahaya ▴ 10

In my case, We had used the Target enrichment sequencing approach for European Bison. We had sequenced a specific chromosome, for example, Chr1, Chr 2, Chr 15, Chr 9 for a specific gene panel.

I had used given below command to call Variant using GATK Haplotype Caller,

/home/anaconda3/share/gatk4-4.2.4.0-0/gatk HaplotypeCaller --java-options -Xmx100g -R genome.fa -I fixed.bam -O NmMdAndUqTag_fixed.g.vcf.gz -ERC GVCF --minimum-mapping-quality 20 --min-base-quality-score 20

But When I check the results obtained using Haplotype Caller, I am getting variants (SNPs) for all chromosomes. I am very confused. Shouldn't We have Variant only on the specific Chromosome that I had mentioned above? I was mainly interested to get Variant on specific sequenced (Mentioned above) chromosomes.

Do we have any specific program or parameter in GATK to call variants within targeted regions?

I had tried to use -L Haplotype caller parameter also. I had created a file named "interval_list", where I had mentioned the regions that were used to sequence. 

head -n 1 interval_list

Chr "\t" Start Position "\t" End Position

But It didn't work for me. 

Your suggestions would be extremely helpful.

Thank you so much in advance.

Sequencing enrichment Target GATK-Haplotype-Caller • 2.0k views
ADD COMMENT
0
Entering edit mode

I am getting variants (SNPs) for all chromosomes. I am very confused

what is the output of

samtools  idxstats  fixed.bam

I was mainly interested to get Variant on specific sequenced

add -L chr1 to your command to only get gvcf ofr chrom chr1

ADD REPLY
0
Entering edit mode

Thank you so much for Reply,

I have a file like this (Chr , Start, End). This is the file that we had used to sequence specific region of few chromosomes.

Is there any possibility to provide this information in the command?

chr25:5371120-6637687
chr15:45220566-45250906
chr26:25203124-26182512
chr29:41013305-41015955
chr9:9975342-10070046

Or Should I provide -L chr 25 chr 9 in the command given above?

ADD REPLY
0
Entering edit mode

show me the output of

samtools  idxstats  fixed.bam

please..

ADD REPLY
0
Entering edit mode

First column is chromosome Number.

samtools idxstats fixed.bam

10 104305016 141994 399

11 107310763 158019 587

12 91163125 615931 1624

13 84240350 313695 875

14 84648390 130584 382

15 85296676 558964 1515

16 81724687 123275 429

17 75158596 133060 376

18 66004023 138564 465

19 64057457 100898 318

1 158337067 260952 780

20 72042655 88197 283

21 71599096 116152 339

22 61435874 82316 259

23 52530062 106692 308

24 62714930 89024 247

25 42904170 2469549 6367

26 51681464 1484592 4001

27 45407902 64325 217

28 46312546 88778 229

29 51505224 91403 326

2 137060424 194638 582

3 121430405 178447 518

4 120829699 181654 572

5 121191424 190725 574

6 119458736 179055 510

7 112638659 155326 474

8 113384836 176518 498

9 105708250 244368 665

MT 16338 6234 43

X 148823899 845334 2294

  • 0 0 24002
ADD REPLY
0
Entering edit mode

so, although you said

When I check the results obtained using Haplotype Caller, I am getting variants (SNPs) for all chromosomes. I am very confused.

you can see that your mapper found some hits one the whole genome. So GATK HaplotypeCaller did his job the right way and found some SNPs on all chromosomes.

ADD REPLY
0
Entering edit mode

It means We have a problem with our Target enrichment Protocol? Should I consider these variants for downstream pipeline such as GWAS?

ADD REPLY
0
Entering edit mode

you can convert your file as a BED file.

chr25  5371119   6637687
chr15  45220565  45250906
chr26  25203123  26182512
chr29  41013304  41015955
chr9   99753421  10070046

and use the the option -L file.bed

ADD REPLY
0
Entering edit mode

I would like to update regarding previous Query,

I would like to mention once again that for target enrichment sequencing, we sequenced multiple specific regions of specific chromosomes. In my case, I had focused on chr12, 13, 15, 25. We had sequenced different specific regions related to specific genes for each chromosome. So Should I include these regions information also in this -L interval_list. As I am mainly interested to see variants on these specific regions.

I have tried to perform like this, Is It the correct way?

/home/anaconda3/share/gatk4-4.2.4.0-0/gatk HaplotypeCaller --java-options -Xmx100g -R genome.fa -I fixed.bam -O fixed_25_1.g.vcf.gz -ERC GVCF --minimum-mapping-quality 20 --min-base-quality-score 20 --include-non-variant-sites -L interval_list.bed

head -n 3 interval_list.bed

25 4088138 4103061

25 5371120 6637687

15 45220566 45250906

First Column is the Chromosome number.

Interval_list.bed -> shows the specific regions on the specific chromosomes in which we are interested to see variants.

Your suggestions would be really helpful. Thank you so much in advance. Have a nice day :)

ADD REPLY
0
Entering edit mode

head -n 1 interval_list Chr "\t" Start Position "\t" End Position

this is not an interval list. A header is missing. https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

Picard-style interval files have a SAM-like header that includes a sequence dictionary.

ADD REPLY

Login before adding your answer.

Traffic: 1806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6