how to calculate the proportion of SNPs with DP more than 5
1
0
Entering edit mode
7.1 years ago
Ana ▴ 200

Hi all, I have extracted the depth of coverage of some of my populations from the vcf-file and each population has 11 individuals (columns) with 11million SNPs(rows) . I have converted them into data.frame and replaced missing values with NA. The first few rows of my data.frame looks like this:

   > head(pop1)
      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
    1  7  3 NA NA 10 NA NA NA NA  NA  NA
    2 14 11  7 NA 12  3  4  5 14   3   6
    3 13 11  7 NA 11  4 NA  4 13   3   4
    4  3 NA  4  5  4 NA NA  6 17  NA   7
    5  3 NA  5  5  4 NA NA  7 20  NA   8
    6  6 NA  3  6 NA NA NA  5 16  NA  10

For each column (or individual), I want to calculate the proportion of SNPs that have DP more than 5! I am a bit confused how to do it in R! I now there are so many R professionals here, can someone help me how to do it in R?

r depth of coverage • 1.5k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

Dear Ana,

This code will do it for your entire data-frame:

table(pop1>5)[[2]] / (nrow(pop1) * ncol(pop1))

For each individual:

apply(pop1, 2, function(x) sum(x>5, na.rm=TRUE)) / apply(pop1, 2, function(x) length(x))

These include NAs in the tabulations

NB - I edited this a few times. There are undoubtedly other solutions

ADD COMMENT
1
Entering edit mode

Thanks @Kevin Blighe, I used something like this which worked

prop_x<-sapply(ind_1, function(x) sum(x > 5, na.rm = TRUE)/length(x))
ADD REPLY

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6