Variantannotation Problem
2
1
Entering edit mode
12.5 years ago
e.karasmani ▴ 140

Dear All,

I have installed VariantAnnotation and I am doing the following to locate variants

library(VariantAnnotation)
library(TxDb.Mmusculus.UCSC.mm9.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
head(seqlevels(txdb))

peak.file =read.table ("D:\\lena\\TOTAL.peak.target.genes.list.txt", header=T, sep='\t')
pr <- RangedData(IRanges(start=peak.file$start, end=peak.file$end), space=peak.file$chromosome, idx=1:nrow(peak.file)) 

loc <- locateVariants(pr, txdb, CodingVariants())

and I get the following error....

Error in function (classes, fdef, mtable)  : 
 unable to find an inherited method for function "locateVariants", for signature "RangedData", "TranscriptDb"

Do you have any idea what it means?

I would thought that the installation wasn't ok but when I check into my libraries I can see the varriantannotation library

What should I do?

Thank you

Best regards Lena

chip-seq • 5.9k views
ADD COMMENT
2
Entering edit mode
12.5 years ago

RangedData is probably not an allowed argument type - it looks like you can use GRanges though. You might be able to cast your RangedData object as a GRanges

pr_GR <- as(pr,"GRanges")

if that doesn't work then just create one from scratch

pr_GR <- GRanges(seqnames=Rle(peak.file$chromosome),ranges=IRanges(start=peak.file$start, end=peak.file$end),strand="*")
ADD COMMENT
0
Entering edit mode

thanks for the answer

I tried both the ways and I get the following error

Error in function (classes, fdef, mtable)  : 
 unable to find an inherited method for function "locateVariants", for signature "RangedData",      "TranscriptDb"

I don't get what is going on.....

ADD REPLY
0
Entering edit mode

did you remember to use pr_GR instead or pr?

ADD REPLY
0
Entering edit mode

yes I did....

first I did the RangedData and then from that I did the GRange and I receive this error..

what do you think?

ADD REPLY
0
Entering edit mode

paste in your new code

ADD REPLY
0
Entering edit mode

here it is

library(VariantAnnotation)
library(TxDb.Mmusculus.UCSC.mm9.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
head(seqlevels(txdb))

 peak.file =read.table ("/data/lena/.chip=20.txt", header=T, sep='\t')
 pr <- RangedData(IRanges(start=peak.file$start, end=peak.file$end), space=peak.file$chromosome, idx=peak.file$target.gene.name) 
pr


pr_GR <- as(pr,"GRanges")
pr_GR <- GRanges(seqnames=Rle(peak.file$chromosome),ranges=IRanges(start=peak.file$start, end=peak.file$end),strand="*")
pr_GR

loc <- locateVariants(pr_GR, txdb, CodingVariants())

and i get the following

 Error in function (classes, fdef, mtable)  : 
 unable to find an inherited method for function "locateVariants", for signature "RangedData", "TranscriptDb"

What should I do?

Please help me....

ADD REPLY
1
Entering edit mode

loc <- locateVariants(pr_GR, txdb, CodingVariants())

ADD REPLY
0
Entering edit mode

sorry I made a typo....I had

loc <- locateVariants(pr_GR, txdb, CodingVariants())

ADD REPLY
1
Entering edit mode

Hi e.karasmani,

Are you still having problems with this? If yes, please show the first few lines of the GRanges you have created. The error message you are getting implies that the RangedData has not been successfully coerced to a GRanges. You can also confirm this with

class(pr_GR)

Jeremy is correct in that RangedData is not an allowed input. To see all possible inputs use the showMethods() function,

showMethods("locateVariants")
Function: locateVariants (package VariantAnnotation)
query="GRanges", subject="GRangesList", region="CodingVariants"
query="GRanges", subject="GRangesList", region="FiveUTRVariants"
query="GRanges", subject="GRangesList", region="IntergenicVariants"
query="GRanges", subject="GRangesList", region="IntronVariants"
query="GRanges", subject="GRangesList", region="SpliceSiteVariants"
query="GRanges", subject="GRangesList", region="ThreeUTRVariants"
query="GRanges", subject="TranscriptDb", region="AllVariants"
query="GRanges", subject="TranscriptDb", region="CodingVariants"
query="GRanges", subject="TranscriptDb", region="FiveUTRVariants"
query="GRanges", subject="TranscriptDb", region="IntergenicVariants"
query="GRanges", subject="TranscriptDb", region="IntronVariants"
query="GRanges", subject="TranscriptDb", region="SpliceSiteVariants"
query="GRanges", subject="TranscriptDb", region="ThreeUTRVariants"
query="Ranges", subject="GRangesList", region="ANY"
query="Ranges", subject="TranscriptDb", region="ANY"
query="VCF", subject="GRangesList", region="ANY"
query="VCF", subject="TranscriptDb", region="ANY"

Any time you have problems with a function in a Bioconductor package feel free to contact the maintainer with questions - contact emails can be found on the man page for the function. Also a post to the Bioconductor mailing list can be helpful.

http://bioconductor.org/help/mailing-list/

Valerie

ADD REPLY
0
Entering edit mode

that error does not make sense unless you are passing it RangedData. try this code:

library(VariantAnnotation)
library(TxDb.Mmusculus.UCSC.mm9.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene

pr_GR<-GRanges(seqnames =Rle(c("chr1", "chr1", "chr1", "chr1")),
               ranges = IRanges(c(90097186,89700255,183583510,34609177), width = 1),
               strand = "*",
               refAllele=rep(DNAStringSet("A"),4),
               varAllele=rep(DNAStringSet("T"),4))
loc <- locateVariants(pr_GR, txdb, CodingVariants())
ADD REPLY
0
Entering edit mode

this is what I am doing and what I get here it is

library(VariantAnnotation)
library(TxDb.Mmusculus.UCSC.mm9.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
head(seqlevels(txdb))

 peak.file =read.table ("/data/lena/.chip=20.txt", header=T, sep='\t')
 pr <- RangedData(IRanges(start=peak.file$start, end=peak.file$end), space=peak.file$chromosome, idx=peak.file$target.gene.name) 
pr


pr_GR <- as(pr,"GRanges")
pr_GR <- GRanges(seqnames=Rle(peak.file$chromosome),ranges=IRanges(start=peak.file$start, end=peak.file$end),strand="*")
pr_GR

Here I get this warning but I don't think it is important

Warning message:
In newGRanges("GRanges", seqnames = seqnames, ranges = ranges, strand = strand,  :
missing values in strand converted to "*"

so I am saying

 class(pr_GR)
 [1] "GRanges"
attr(,"package")
[1] "GenomicRanges"

then I am doing that

loc <- locateVariants(pr_GR, txdb, CodingVariants())

and I get that error

Error in genes[followIdx] : 
 subscript contains NAs or out of bounds indices

I don't understand what is going on......

Sorry for that....I am a rookie in this field

Do you have any idea?

Thank you in advance

Best regards Lena

ADD REPLY
0
Entering edit mode

show us a sample of your pr_GR: head(pr_GR)

ADD REPLY
0
Entering edit mode

here you are

head (pr_GR)
GRanges with 6 ranges and 0 elementMetadata values:
  seqnames                 ranges strand
     <Rle>              <IRanges>  <Rle>
 [1]     chr1 [  9933699,   9934385]      *
 [2]     chr1 [ 88255056,  88257357]      *
 [3]     chr1 [ 88421225,  88425605]      *
 [4]     chr1 [ 95341332,  95342552]      *
 [5]     chr1 [133806728, 133807938]      *
 [6]     chr1 [133903801, 133905113]      *
---
 seqlengths:
  chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX  chrY
  NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

what do you think?

ADD REPLY
0
Entering edit mode

those work fine. keep taking bigger chunks of your prGR (head(prGR,100)...head(prGR,500)...head(prGR,2000)) until you nail down the culprit.

> pr_GR<-GRanges(seqnames =Rle(c("chr1", "chr1", "chr1", "chr1", "chr1", "chr1")),
+                ranges = IRanges(start=c(9933699,88255056,88421225,95341332,133806728,133903801), end=c(9934385,88257357,88425605,95342552,133807938,133905113)),
+                strand = "*")
> locateVariants(pr_GR, txdb, CodingVariants())
DataFrame with 6 rows and 4 columns
  queryHits        txID                    geneID   Location
  <integer> <character> <CompressedCharacterList>   <factor>
1         1          NA               70675,73824 intergenic
2         2          NA               15559,17975 intergenic
3         3          NA               71863,19231 intergenic
4         4        2646                    110611     intron
5         5          NA               98415,98415 intergenic
6         6          NA               13714,13714 intergenic
ADD REPLY
0
Entering edit mode

i have made the pr_GR as a data frame and saved it and and I checked.....there are no NAs in the file and it looks ok

here is an example

head(pr_GR,2000)
 GRanges with 2000 ranges and 0 elementMetadata values:
     seqnames                 ranges strand
        <Rle>              <IRanges>  <Rle>
 [1]     chr1 [  9933699,   9934385]      *
 [2]     chr1 [ 88255056,  88257357]      *
 [3]     chr1 [ 88421225,  88425605]      *
 [4]     chr1 [ 95341332,  95342552]      *
 [5]     chr1 [133806728, 133807938]      *
 [6]     chr1 [133903801, 133905113]      *
 [7]     chr1 [137042981, 137044514]      *
 [8]     chr1 [171447596, 171448584]      *
 [9]     chr1 [172993767, 172997056]      *
     ...      ...                    ...    ...
 [1992]     chr1   [31787477, 31787891]      *
 [1993]     chr1   [32972614, 32973160]      *
[1994]     chr1   [35847483, 35847803]      *
  [1995]     chr1   [36588633, 36589271]      *
[1996]     chr1   [37835024, 37835447]      *
 [1997]     chr1   [38238071, 38238419]      *
[1998]     chr1   [44352200, 44352632]      *
[1999]     chr1   [44953809, 44954194]      *
[2000]     chr1   [59402035, 59402338]      *
---
seqlengths:
  chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX  chrY
  NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

and

head(pr_GR,500000)
 GRanges with 52445 ranges and 0 elementMetadata values:
      seqnames                 ranges strand
         <Rle>              <IRanges>  <Rle>
  [1]     chr1 [  9933699,   9934385]      *
  [2]     chr1 [ 88255056,  88257357]      *
  [3]     chr1 [ 88421225,  88425605]      *
  [4]     chr1 [ 95341332,  95342552]      *
  [5]     chr1 [133806728, 133807938]      *
  [6]     chr1 [133903801, 133905113]      *
  [7]     chr1 [137042981, 137044514]      *
  [8]     chr1 [171447596, 171448584]      *
  [9]     chr1 [172993767, 172997056]      *
  ...      ...                    ...    ...
 [52437]     chrY     [1664480, 1664889]      *
 [52438]     chrY     [ 500256,  500631]      *
 [52439]     chrY     [1663851, 1664427]      *
 [52440]     chrY     [1398029, 1398387]      *
 [52441]     chrY     [ 346885,  347200]      *
 [52442]     chrY     [2773727, 2774095]      *
 [52443]     chrY     [2853026, 2853651]      *
  [52444]     chrY     [2765621, 2765989]      *
  [52445]     chrY     [2868909, 2871139]      *
  ---
  seqlengths:
   chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX  chrY
     NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

I don't know what should I do.....

ADD REPLY
1
Entering edit mode

You need to actually try these chunks with locateVariants. This is debugging 101.

pr_GR_sample<-head(pr_GR,500000)
locateVariants(pr_GR_sample, txdb, CodingVariants())
ADD REPLY
0
Entering edit mode

Thanks,

I have tried that and here is what I did

 pr3=head(pr_GR,52000)
loc2 <- locateVariants(pr3, txdb, AllVariants())


DataFrame with 80324 rows and 4 columns
 queryHits        txID                    geneID   Location
 <integer> <character> <CompressedCharacterList>   <factor>
1             1          NA               70675,73824 intergenic
2             2          NA               15559,17975 intergenic
3             3          NA               71863,19231 intergenic
4             4        2646                    110611     intron
5             5          NA               98415,98415 intergenic
6             6          NA               13714,13714 intergenic
7             7          NA              320139,68724 intergenic
8             8        3339                     66977     intron
9             8        3340                     66977     intron
...         ...         ...                       ...        ...
80316     51996          NA                     59026     intron
80317     51996          NA                     59026     intron
80318     51996          NA                     59026     intron
80319     51996          NA                     59026     intron
80320     51996          NA                     59026     intron
80321     51997          NA              24061,245666 intergenic
80322     51998          NA                     12229     intron
80323     51999          NA               74279,14735 intergenic
80324     52000          NA               78755,78755 intergenic

It doesn't look ok the output

What do you think is going on?

Neither of the columns look like what they supposed to be according to the manual of the Varriant annotation library.... Don't you agree?

Do you have any idea?

Thanks Lena

ADD REPLY
1
Entering edit mode

Hi Lena,

What part of this output looks wrong? It looks ok to me. A variant in an intron region will have a geneID but an intergenic variant (one that falls inbetween genes) will have geneID's for both the preceding and following genes.

Valerie

ADD REPLY
1
Entering edit mode
12.5 years ago

I am not familiar with this tool. But snpEff is another tool that you can use in case you couldn't resolve this problem.

ADD COMMENT

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6