Hi everyone,
I am trying to retrieve gene information from ensembl website to compare the the gene information for mouse(mm10) with repetitive DNA is specific genome regions (UTR'S and intron, and upstream). I did two ways to get these files the first one using the R code below, and the second one by going directly to ensembl website using biomart tab to get these files.
I have 2 issues, the first one that there is a difference in total observations(rows) in both ways (I mean the total rows in both files are different).
The second issue, when I start find the genes that sharing the same position with these specific regions for repetitive DNA I got empty file results, and I don't know what causes that. BTW, I downloaded the repetitive DNA files from UCSC website using ensemble genes in track tab.
R code to retrieve the gene info.
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library(biomaRt)
### Retrieving mouse (mm10/GRCm38) from Ensembl website ###
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mm10_Gene=getBM(attributes=c("ensembl_gene_id","chromosome_name",'strand','transcript_start','transcript_end', "mgi_symbol"),mart=mouse)
Hi Ying,
I used mouse(mm10) release, which is the latest release. Then I used table browser in UCSC to download the repetitive DNA and in the track tab I used ensembl genes then I got for example Introns plus region from the get output tab. since UCSC doesn't provide the gene info for ensemble genes specially mgi-symbols I retrieve the gene info from ensembl website directly or by using the r code above.
not the mouse reference, but the annotation release, if you look on the ensembl website it is currently on release 80. UCSC is probably using a different release also, annotations are updated more often than reference is.
So is there any way to download the repetitive DNA from Ensembl website directly like the one on UCSC? For example I want to download the introinc, CDS, 10K upstream and 10k downstram for the mouse (mm10) and human(hg19). and I think by doing that the annotation data and repetitive DNA will be consist for this analysis since they are from the same source which is ensembl.
have a look here: How To Get All Ensembl Repeatfeatures From Biomart Or The Ensembl Rest Api?