Filter VCF based on INFO column values in R
1
0
Entering edit mode
5 months ago
Stavroula • 0

Hello all,

I was wondering if anyone can help me, I have a table with the following format:

FORMAT                                                  
GT:DP:HF:CILOW:CIUP:SDP      (column V9)          


Info
0/1:4282:0.001:0.0:0.003:5;0.   (column V10)

and I want to filter in R for values that are HF<0.1 and HF>0.99 without caring about the rest of the info.

Is there a way to do that?

I have been trying with this command:

Control1MTL1_Filtered2<-filter(Control1MTL1_Filtered, V10= c(::<=0.1::)) but it does't recognise the format.

Any ideas would be more than appreciated.

Best, Stavroula

R vcf • 641 views
ADD COMMENT
0
Entering edit mode

Why do you want to use R for something that is better addressed by purpose-built utilities such as bcftools?

On second thought, it doesn't look like you have the VCF, just a tab delimited file with VCF columns. You're going to need to do some wrangling.

First off, V10 is not Info. INFO is a completely separate column, probably V8. Call V10 "sample" or something. Split V9 and V10 using : as the delimiter and then create a key-value pair with split V9 as the keys and split V10 as the values. It's going to take some serious dplyr/tidyr gymnastics to do this, so rpolicastro is probably the person that can help you there.

ADD REPLY
0
Entering edit mode

Indeed! That looks like a genotype column.

ADD REPLY
0
Entering edit mode

I agree with the others on using bcftools and defining a proper filter, especially if you want to export and use the vcf file later on.

ADD REPLY
2
Entering edit mode
5 months ago
zx8754 12k

Use dedicated tool for the job - bcftools.

But if you must use R, then re-read the delimited column with a new separator, then subset as usual, see example:

#example data
d <- data.frame(
  V9 = c("GT:DP:HF:CILOW:CIUP:SDP"),
  V10 =  c("0/1:1:0.001:0.0:0.003:5", 
           "1/1:2:1:0.0:0.003:5", 
           "1/0:3:1:0.0:0.003:5", 
           "0/0:4:0.005:0.0:0.003:5"), 
  V11 = 1:4,
  V12 = 5:8)
#                        V9                     V10 V11 V12
# 1 GT:DP:HF:CILOW:CIUP:SDP 0/1:1:0.001:0.0:0.003:5   1   5
# 2 GT:DP:HF:CILOW:CIUP:SDP     1/1:2:1:0.0:0.003:5   2   6
# 3 GT:DP:HF:CILOW:CIUP:SDP     1/0:3:1:0.0:0.003:5   3   7
# 4 GT:DP:HF:CILOW:CIUP:SDP 0/0:4:0.005:0.0:0.003:5   4   8

#use read.table with new delimiter, and cbind it back to other columns.
x <- cbind(
  read.table(text = d$V10, sep = ":", 
             col.names = unlist(strsplit(c("GT:DP:HF:CILOW:CIUP:SDP"), ":"))),
  d[, c("V11", "V12")])

# then subset as usual
x[ x$HF > 0.1,  ]
#    GT DP HF CILOW  CIUP SDP V11 V12
# 2 1/1  2  1     0 0.003   5   2   6
# 3 1/0  3  1     0 0.003   5   3   7
ADD COMMENT

Login before adding your answer.

Traffic: 2189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6