Question

Problem with bcftools roh

0

Entering edit mode

6.2 years ago

noushin.farnoud ▴ 130

Dear All,

I have a problem with bcftools roh and hope you can guide me to what may be causing this:

I run Example-1 of the Github: http://samtools.github.io/bcftools/howtos/roh-calling.html and see the output consists of a state (0:HW, 1:AZ) , and quality for each position that was in the original vcf file (test.vcf.gz).

I ham a bam file and like to get the exact same-format output for my list of positions. Here is how I generate the original .vcf (just in case that the way I am generating this is causing the issue that I explain later).

samtools mpileup -uf $FASTA ./myfile.bam -l ./dbSNP_chr9.bed | bcftools call -m -O z > roh.test.vcf.gz

This will provide a .vcf for 309 positions that I queried, and the header/rest of the vcf looks fine:

    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ...
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callVersion=1.3.1+htslib-1.3.1
    ##bcftools_callCommand=call -m -O z
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mysample
    9       421981  .       G       .       0       .       DP=470;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AN=2;DP4=208,196,0,1;MQ=60 GT      0/0
    9       422026  .       T       .       0       .       DP=405;MQSB=1;MQ0F=0;AN=2;DP4=148,216,0,0;MQ=60 GT      0/0
    9       3206113 .       A       .       0       .       DP=338;MQSB=1;MQ0F=0;AN=2;DP4=219,79,0,0;MQ=60  GT      0/0
...

But when I try the command in the example, I only receive two lines (positions) in the output file as also noted in the stdout!

bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz 
# This file was produced by: bcftools roh(1.3.1+htslib-1.3.1)
# The command line was: bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz
#
# [1]Chromosome [2]Position [3]State (0:HW, 1:AZ)   [4]Quality
9   5009541 1   3.0
9   5125250 1   3.0
Number of lines: total/processed: 309/2

My .vcf has PL but I also tried skipping the PL scores with -G30 (as is in the example), just in case it had something to do with this but the answer was exactly the same.

I greatly appreciate if you could point me to what may be causing this.

Thank You,

nfarnoud

bcftools roh homozygosity • 3.7k views

ADD COMMENT • link 6.2 years ago by noushin.farnoud ▴ 130

score 0 · Answer 1 · 2019-02-22

0

Entering edit mode

6.2 years ago

Kevin Blighe 89k

Is there an error? The program appeared to run successfully.

ADD COMMENT • link 6.2 years ago by Kevin Blighe 89k

0

Entering edit mode

Yes, it's puzzling. No error. But as noted in the stdout (last line of the block above) it says 2 (out of 309) lines are processed with no further explanation:

Number of lines: total/processed: 309/2

ADD REPLY • link 6.2 years ago by noushin.farnoud ▴ 130

0

Entering edit mode

The genotypes that you have posted are each 0/0, i.e., homozygous reference. How are the remaining genotypes in your input file? Why are there only 309 variants? Are they scattered genome-wide? Note that, for ROH to do its work, the variants have to be a certain distance apart. For example, one cannot detect ROH with accuracy by processing 3 variants that span a region of 1 megabase.

ADD REPLY • link 6.2 years ago by Kevin Blighe 89k

0

Entering edit mode

The distance between variants is something I did not take into consideration! The reason that there are 309 variants is that I wanted to query LOH only on a specific chromosome (chr 9) and also I am working with a targeted assay. So what I did was finding all dbSNP snps that fall within my assay's .bed file and queried them using the 1st command (samtools + bcftools), and then used the resultant .vcf file as the input to roh command. About genotypes: most of the SNPs look homozygous in chr9 if we look only at GT field, however the percentage contribution of REF/ALT alleles for homozygous-called SNPs is still quite different (DP4) and that is why I thought the roh command may model those per-allele-frequencies and generate a different call. I may have not used the right approach to answer my question.

ADD REPLY • link 6.2 years ago by noushin.farnoud ▴ 130

1

Entering edit mode

Yes, basically, you require a high density of genotypes in order to accurately make a ROH call. I have used exome-seq data in the past for ROH analysis. You may have to play around with the parameters.

Another option is to use PLINK's ROH implementation.

ADD REPLY • link 6.2 years ago by Kevin Blighe 89k

1

Entering edit mode

@kevin Thanks a lot. I came across that in searches, will give it try. Thanks again for your help and input!

ADD REPLY • link 6.2 years ago by noushin.farnoud ▴ 130