PopGenome: there are missing regions when calculating Tajima's D per gene
0
0
Entering edit mode
15 months ago
Bing • 0

Hello all,

I am new to PopGenome and would like to ask one question that greatly confused me.

I was trying to calculate Tajima's D by gene for my whole genome data. I imported the gff files and subseted the data by "gene". See my codes below. If I use the whole gff file, when I set tid="1", it reads not only chromosome 1, but also chr11 and chr12. Therefore, I subset chr1.gff. However, when I checked region names, there are some genes missing.

Has anyone encountered with this problem before? How do you solved this?

My codes:

GENOME.class <- readVCF('indica.vcf.gz',numcols = 70000,tid="1",from=1,to=45000000,gffpath = "chr1.gff")
GENOME.class <- set.populations(GENOME.class,list(c("C019","C135","C139","C151","ZS97"),c("C148","W161","W169","MH63")),diploid = TRUE)
# Splitting data into genes subsites 
GENOME.class.slide <- splitting.data (GENOME.class,subsites="gene")
GENOME.class.slide@region.names

The number of genes on chr 1 should be 5,271:

enter image description here

However, there were only 2,189 whe I checked.

enter image description here

PopGenome • 504 views
ADD COMMENT

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6