Hello all,
I understand this is a very basic question but I am unable to find the explanation for the 2 columns in bcftools ROH output. I went through their manual in https://samtools.github.io/bcftools/howtos/roh-calling.html. But could not find the output format explanation.
The header looks like this
# RG [2]Sample [3]Chromosome [4]Start [5]End [6]Length (bp) [7]Number of markers [8]Quality (average fwd-bwd phred score)
What is RG, Number of Markers and Quality indicate.
I presumed the Number of markers are the SNPs present but the below line in my output shows more markers than the length of homozygous region.
RG sample1 1 66881536 66889308 7773 46939 84.5
I have several positions like this. Any help would be appreciated.
I am not sure if this is the correct answer but I think I figured out what RG and ST can be. When you use multiple sample VCF file, it calculates ROH over a region (RG) but when you use a single sample VCF file, it gives you the state (ST; either HW or AZ)
Thank you so much for making it clear. But I have a question that ROH is an intrapopulation approach. Then what is the logic to use multiple sample VCF file for the analysis? Kindly elaborate on that.
Thank you in advance