Question about ROH analysis by Plink 1.9
2
5
Entering edit mode
5.1 years ago
60343011s ▴ 50

Hi all,

I have recently tried to estimate runs of homozygosity (ROH) from my vcf file by using plink 1.9.

I ran following code to generate binary files that plink required:

plink --vcf myfile.vcf --make-bed --out out_name --no-sex --no-parents --no-fid --no-pheno --allow-extra-chr

This vcf file only contains one individual and about 3 million SNPs.

I used --allow-extra-chr here because I mapped my sequences to a drift genome.

Then, I used following code (with default parameters), trying to estimate ROH of my sample:

plink -bfile out_name --homozyg --allow-extra-chr

The result gave me 0 ROH, and only header produced in .hom file.

I also tried different parameters with different SNP windows and criterions, such like:

plink -bfile out_name --homozyg --homozyg-window-snp 50 --homozyg-snp 50 --homozyg-window-missing 3 --homozyg-kb 100 --homozyg-density 1000 --allow-extra-chr

However, all the results were the same, that showed :

PLINK v1.90b6.12 64-bit (28 Oct 2019)          www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:

  --allow-extra-chr
  --bfile out_name
  --homozyg
  --homozyg-density 1000
  --homozyg-kb 100
  --homozyg-snp 50
  --homozyg-window-missing 3
  --homozyg-window-snp 50

515905 MB RAM detected; reserving 257952 MB for main workspace.
3708761 variants loaded from .bim file.
1 person (0 males, 0 females, 1 ambiguous) loaded from .fam.
Ambiguous sex ID written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
3708761 variants and 1 person pass filters and QC.
Note: No phenotypes present.
--homozyg: Scan complete, found 0 ROH.

Results saved to plink.hom + plink.hom.indiv + plink.hom.summary .

Does anyone have idea why this happened to my files?

Will be grateful for any suggestions.

snp next-gen plink • 8.8k views
ADD COMMENT
0
Entering edit mode

Heys, I'm struggling to having results from plink when I try to calculate the roh in my data. I did the snp calling both with gatk and with the code you left here (with --gvcf) but when I run plink, I keep getting the same message: --homozyg: Scan complete, found 0 ROH.

Could you upload how your final vcf looks like? I'm not sure how the homozygous positions should look like.

Any help would be appreciated!!

ADD REPLY
0
Entering edit mode

Hi Gubrins...Have you solved your problem. I am facing the same issue with plink v1.9. Could you please explain, how to resolve this issue.

ADD REPLY
0
Entering edit mode

Heys Adarsh, I'm sorry but I did not manage to solve it. I changed to bcftools and I'm calculating the ROHs there (https://samtools.github.io/bcftools/howtos/roh-calling.html). Good luck!

ADD REPLY
0
Entering edit mode

Thank you for good suggestion.

ADD REPLY
7
Entering edit mode
5.1 years ago

This is unsurprising if your VCF contains only positions where your sample differs from the reference genome; practically all the ROHs will span regions which are excluded by your VCF. You may need to reconstruct your VCF in a way that includes homozygous-REF calls in an unbiased manner.

ADD COMMENT
1
Entering edit mode

Hi Chang, I used Haplotypecaller, CombinedGVCF, GenotypeGVCF, selectvariant and hard filtration to generate vcf files. My filtrated vcf files contains only 0/1 genotypes. How can I add 0/0 and 1/1 sites in it. Thanks

ADD REPLY
0
Entering edit mode

Hi, I think the answer could be here: Calculating ROHs and Heterozygosity from GATK

If I have well understood, the following flags can output all positions in your data set rather than just variants from the reference genome, and therefore enable the acquisition of ROH.

Haplotypecaller  -ERC BP_RESOLUTION 
GenotypeGVCFs --included-non-variant-sites true
ADD REPLY
0
Entering edit mode

Yes, my VCF file contain only genotype 0/1 sites. I called those sites by

bcftools mpileup -a AD,DP -q 20 -f ref.fa input.bam | bcftools call -V indels -m -v -O v -o myfile.vcf

So I need to recall all the sites (0/0,0/1,1/1) by function like --gvcf under bcftools, right?

ADD REPLY
0
Entering edit mode

Just to update the results, I've successfully used a VCF file that contain all sites for ROH analysis. Thanks!

ADD REPLY
0
Entering edit mode

Hey, I have recently faced a similar question with you, When I tried to test the ROH analysis by using PLINK 1.9, the log file shows "Note: No phenotypes present", because my VCF files have a lot of "./.", I called the sites by samtools mpileup and bcftools, so could you please tell me how could I solve it? Adding "--gvcf" function? Thanks!

ADD REPLY
0
Entering edit mode

Hi .. Could you please explain me about this. How could you do that with gVCF file. I used Haplotypecaller, CombinedGVCF, GenotypeGVCF, selectvariant and hard filtration. My filtrated vcf files contains only 0/1 genotypes. How can I add 0/0 and 1/1 sites in it.

ADD REPLY
1
Entering edit mode
2.5 years ago
yzliu01 ▴ 10

Just in case it's still a common issue for others, I got the same issue and it worked if you down optimize the flag "--homozyg-kb 500", for example. It seems not to be necessary to reconstruct vcf files with something like 'Haplotypecaller -ERC BP_RESOLUTION '. Hope it's helpful.

ADD COMMENT
0
Entering edit mode

Thanks Honey_man, I have used the same setting to lower the length threshold for which Plink consider a Run of Homozygosity.

Thanks to mention the 'Haplotypecaller -ERC BP_RESOLUTION ' or 'GVCF' detail, it has been a puzzle for me recently. I was wondering if Plink needs the non-variant reference positions or blocks.

ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6