Making a comparision in an uneven CompressedList Object.
0
0
Entering edit mode
2.8 years ago
Kyle ▴ 10

So I have the following vcf file - with the following metadata:

test_vcf <- readVcf(open(VcfFile(file = "C3H_HeH.mgp.v5.snps.dbSNP142.vcf.gz",
                                   index = "mouse-snps-all.annots.vcf.gz.tbi")))
str(test_vcf@info) 

formal class 'DFrame' [package "S4Vectors"] with 6 slots
..@ rownames       : NULL
..@ nrows          : int 1678126
..@ listData       :List of 4
.. ..$ INDEL: logi [1:1678126] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. ..$ DP   : int [1:1678126] 15 10 15 8 11 6 18 11 15 23 ...
.. ..$ DP4  :Formal class 'CompressedIntegerList' [package "IRanges"] with 5 slots
.. .. .. ..@ elementType    : chr "integer"
.. .. .. ..@ elementMetadata: NULL
.. .. .. ..@ metadata       : list()
.. .. .. ..@ unlistData     : int [1:6712504] 0 0 10 5 0 0 8 2 0 0 ...
.. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
.. .. .. .. .. ..@ end            : int [1:1678126] 4 8 12 16 20 24 28 32 36 40 ...
.. .. .. .. .. ..@ NAMES          : chr [1:1678126] "8" "9" "13" "16" ...
.. .. .. .. .. ..@ elementType    : chr "ANY"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata       : list()
.. ..$ CSQ  :Formal class 'CompressedCharacterList' [package "IRanges"] with 5 slots
.. .. .. ..@ elementType    : chr "character"
.. .. .. ..@ elementMetadata: NULL
.. .. .. ..@ metadata       : list()
.. .. .. ..@ unlistData     : chr [1:3365412] "A||||intergenic_variant||||||||" "G||||intergenic_variant||||||||" "A||||intergenic_variant||||||||" 
"C||||intergenic_variant||||||||" ...
.. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
.. .. .. .. .. ..@ end            : int [1:1678126] 1 2 3 4 5 6 7 8 9 10 ...
.. .. .. .. .. ..@ NAMES          : NULL
.. .. .. .. .. ..@ elementType    : chr "ANY"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata       : list()
..@ elementType    : chr "ANY"
..@ elementMetadata: NULL
..@ metadata       : list()

I want to filter the vcf using values inside the $CSQ column, however, this structure is an uneven CompressedCharacterList where one or multiple values can exist inside one element (as a str / vector, respectively). This is biologically sensible as the same site has multiple predictions but it breaks every function I've tried to use so far (such as str_detect).

The only thing that I can get to work is iterating through the vcf and unlisting the indexed data:

bad_list = vector(length = length(SNP_data))

# TODO: fix this dumb loop
for (i in seq_along(SNP_data@info$CSQ)){

bad_list[i]=stringr::str_detect(SNP_data@info$CSQ[i]@unlistData, "intergenic_variant")

}

funct_SNPs <- test_vcf[bad_list != TRUE]

But this is dumb - what's a function that can handle this without making my eyes bleed?

VariantAnnotation CompressedCharacterList • 422 views
ADD COMMENT

Login before adding your answer.

Traffic: 1506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6