Question

PopGenome: there are missing regions when calculating Tajima's D per gene

0

Entering edit mode

15 months ago

Bing • 0

Hello all,

I am new to PopGenome and would like to ask one question that greatly confused me.

I was trying to calculate Tajima's D by gene for my whole genome data. I imported the gff files and subseted the data by "gene". See my codes below. If I use the whole gff file, when I set tid="1", it reads not only chromosome 1, but also chr11 and chr12. Therefore, I subset chr1.gff. However, when I checked region names, there are some genes missing.

Has anyone encountered with this problem before? How do you solved this?

My codes:

GENOME.class <- readVCF('indica.vcf.gz',numcols = 70000,tid="1",from=1,to=45000000,gffpath = "chr1.gff")
GENOME.class <- set.populations(GENOME.class,list(c("C019","C135","C139","C151","ZS97"),c("C148","W161","W169","MH63")),diploid = TRUE)
# Splitting data into genes subsites 
GENOME.class.slide <- splitting.data (GENOME.class,subsites="gene")
GENOME.class.slide@region.names

The number of genes on chr 1 should be 5,271:

enter image description here

However, there were only 2,189 whe I checked.

enter image description here

PopGenome • 504 views

ADD COMMENT • link 15 months ago by Bing • 0