Hello all,
I am new to PopGenome and would like to ask one question that greatly confused me.
I was trying to calculate Tajima's D by gene for my whole genome data. I imported the gff files and subseted the data by "gene". See my codes below. If I use the whole gff file, when I set tid="1", it reads not only chromosome 1, but also chr11 and chr12. Therefore, I subset chr1.gff. However, when I checked region names, there are some genes missing.
Has anyone encountered with this problem before? How do you solved this?
My codes:
GENOME.class <- readVCF('indica.vcf.gz',numcols = 70000,tid="1",from=1,to=45000000,gffpath = "chr1.gff")
GENOME.class <- set.populations(GENOME.class,list(c("C019","C135","C139","C151","ZS97"),c("C148","W161","W169","MH63")),diploid = TRUE)
# Splitting data into genes subsites
GENOME.class.slide <- splitting.data (GENOME.class,subsites="gene")
GENOME.class.slide@region.names
The number of genes on chr 1 should be 5,271:
However, there were only 2,189 whe I checked.