Hello everyone,
I guess it will be a very silly mistake, but I am not able to make this work.
I am using the biomaRt package to obtain the chromosome length of different chromosomes of the human genome. The thing is, when retrieving other information such as the ensembl ID, it works well (using other filters). However, with this code, the programme never stops running. Why? What am I doing wrong?
mart_h <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
test <- getBM("chromosome_end", filters="chromosome_name", values=c(1:2), mart_h)
test_2 <- getBM("chromosome_end", filters="chromosome_name", values=1, mart_h)
I know there are more efficient ways of obtaining these lengths, but I would prefer using this package (this code it is part of a pipeline).
Thank you very much, I am pretty sure it will be a very silly thing, but I am not able to solve this.
Have a nice day!
Oh, my God.
Then... what is exactly what I am retrieving? What is that object?
(Oh, and thank you very much!)
I haven't checked but, based on the values and number of entries, I suspect it's the positions of the exon ends (STAT1 is b/t 190-191 Mb on chrII and is encoded by a variety of alternatively spliced transcripts).
Ahá. Well, I guess I will have to think of another way of getting this information. Do you know if there is any way to query it somehow or will I have to use the typical chrom.size files?
In any case, thank both of you for your kind help. It is very nice to deal with people like you! :)
Most efficient would probably be to use the typical chromosome sizes, as you also don't really expect that those will change from week to week. It's just a small data file you need to save to disk and read when starting the script (or less optimal: have the data hard coded in your tool). You can always try ensembl biomart using your browser to see which information is available, without trial and error on the command line.
Agree with this. For common organisms the GenomeInfoDb package is an alternative to using a local stored file.
You should be able to parse this information from the sequence-length field of the sequence_report: