Finding important predispositions using R
2
1
Entering edit mode
10.4 years ago

Hi,

I have a list of variants called from a individual genome and I'm trying to filter out the important predispositions from it. My approach was to download the variant_summary.txt.gz file from ClinVar website, in which most of the variants related to human health are being recorded, so that I can intersect my variants with it.

I loaded the variant_summary.txt into R and it says the Dataset has 154358 rows and 25 columns. But when I check with wc-l linux the number of records is 198661. I double checked the no of rows by visualizing the data in excel. It had 198,661. My questions are,

  1. Why R does not load all the records of my file?
  2. Given the fact that I'm still novice to bioinformatics do you think that my approach is feasible in finding predispositions if I fix the R issue?

Thanks you very much.

R ClinVar • 3.1k views
ADD COMMENT
1
Entering edit mode
Most likely there is a problem with line 154358 in your file. It may contain a different number of columns or may not use the same delimiter as the other lines. Unfortunately read.delim does not give a warning in these cases, try reading with read.csv.
ADD REPLY
3
Entering edit mode
10.4 years ago
Michael 55k

You can read in the file like so:

var.anno = read.delim("Downloads/variant_summary.txt", header=T)
dim(var.anno)
[1] 198660     25

That gave me the correct dimensions, first row contains the header.

Regarding your second question, I think it is reasonable to try to annotate detected variants and their association with phenotypes from available databases. Depending on what you are after, you might also want to consult other variant databases, e.g. dbSNP. dbSNP also links to ClinVar if there is an entry there and to OMIM.

There are also R packages for this purpose, some here:

ADD COMMENT
1
Entering edit mode

You can also add the arguments comment.char = "" and quote = "" to read.table(). The input file contains both "#" and single-quote characters, which are causing the truncated read issue.

ADD REPLY
0
Entering edit mode

Thank you very much. I 'm using annovar to annotate my variants, but unfortunately it does not provide annotations with phenotype association databases. That's why I tried merging clinvar txt, with my variant list using location coordinates. Could you please mention if you know any other alternatives.

Anyway my questions are solved. Thanks again :)

ADD REPLY
1
Entering edit mode

Some more tools:

  • Ensembl VEP
  • SNPedia, links to a lot of other databases for each snp, also has a bulk api
ADD REPLY
0
Entering edit mode

Thanks :)

ADD REPLY
1
Entering edit mode
10.4 years ago

Dear nilakshafreezon,

a) How are you reading the CSV (as it .txt format)? Are you using the correct delimiter? like is you text is separated by a comma or space or tab or a semicolon?

IMP: R is mostly like command line version of excel (for newbies) so, it is almost like what you do on excel for opening a file do the same as command line.

The Above image shows how you import a text in Excel, the same way you need to import in R. eg.

> tree <- read.csv(file="trees91.csv",header=TRUE,sep=",")

Here we defined the delim as "sep". [http://www.cyclismo.org/tutorial/R/input.html#reading-a-csv-file]

Note: This might be of some help:

http://stackoverflow.com/questions/13706188/importing-csv-file-into-r-numeric-values-read-as-characters

b) For bioinformatics, one have to have equally efficient in both biology and (computer) languages. But it doesn't matter, you can learn the other in no time, if you the basics of one.

Regards,
Devashish Das

ADD COMMENT
0
Entering edit mode

Dear Devashish,

  1. Of course, I used the correct 'sep' value.In this case it's "\t" as it's a tab delimited file. Furthermore it's not a problem of loading the files into R. But it depicts only a portion of the records.
  2. I was asking about from more experienced users about their approaches in finding important predispositions. Whether their approaches are similar or what do they have in addition .

And thank you for your kind consideration.

ADD REPLY

Login before adding your answer.

Traffic: 1670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6