Hi guys sorry I am here again, but I try for sure to look for some answer before I come here bother you all. I following a tutorial to learn to work with R and some RNA-Seq data and time to time I have to face a different problem. Until now I have being dealing with Biomart annotation and got some error to find the exact attributes and stuff, but I got everything ok looking for at google.
But at the last part of the annotation I got to face that I didn't get any annotated genes.
This is the head of the merged gtf file that I got with stringtie2:
Chr1 StringTie transcript 3631 5899 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; gene_name "AT1G01010.TAIR10"; xloc "XLOC_000001"; ref_gene_id "AT1G01010.TAIR10"; cmp_ref "AT1G01010.1.TAIR10"; class_code "="; tss_id "TSS1";
Chr1 StringTie exon 3631 3913 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "1";
Chr1 StringTie exon 3996 4276 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "2";
Chr1 StringTie exon 4486 4605 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "3";
Chr1 StringTie exon 4706 5095 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "4";
Chr1 StringTie exon 5174 5326 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "5";
Chr1 StringTie exon 5439 5899 . + . transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "6";
And it looks very similar with the file from the guys that put out the tutorial. But at the and I got none annotated genes.
g_sign<-subset(results_genes_no_filter,pval<0.01 & abs(log2fc)>0.584)
head(g_sign)
> head(g_sign)
feature id fc pval qval log2fc
125 gene AT1G04610.TAIR10 5.611861e+01 0.0001284833 0.5662515 5.810407
128 gene AT1G04660.TAIR10 3.523963e+01 0.0021592230 0.5662515 5.139127
205 gene AT1G07450.TAIR10 2.100165e+00 0.0021455935 0.5662515 1.070503
220 gene AT1G07985.TAIR10 1.037975e+02 0.0002824740 0.5662515 6.697627
676 gene AT1G21830.TAIR10 4.241355e+00 0.0024521123 0.5662515 2.084525
1003 gene AT1G31360.TAIR10 4.043344e+05 0.0057775911 0.5662515 18.625189
I got this well after looking for some adjusts
#### ANNOTATION - BioMart ########
listMarts(host="plants.ensembl.org")
mart=useMart("plants_mart", host="plants.ensembl.org")
head(listDatasets(mart))[grep("thaliana",listDatasets(mart)[,1]),]
mart <- useMart("plants_mart", dataset="athaliana_eg_gene", host="plants.ensembl.org")
listFilters(mart)
head(listFilters(mart),20)
listFilters(mart)[grep("ensembl",listFilters(mart)[,1]),]
searchAttributes(mart = mart, pattern = "ensembl_gene_id")
listAttributes(mart = mart, page="feature_page")
getBM(attributes=c("ensembl_gene_id”,”ensembl_transcript_id”,”ensembl_peptide_id”,"ensembl_exon_id" ,”description"),mart=thale_mart)
head(thale_data_frame)
tail(thale_data_frame)
## Now match the genes from our list to this dataset
annotated_genes = subset(thale_data_frame, ensembl_gene_id %in% g_sign$id)
dim(annotated_genes )
[1] 0 3
There are anyway to fix it?
I follow the steps and had similar results, however the last thing that I could't fix was that my Biomart package (2.40.5) is old for my R version, but I tried to update without good exit.
> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 6.1
year 2019
month 07
day 05
svn rev 76782
language R
version.string R version 3.6.1 (2019-07-05)
nickname Action of the Toes
Any suggestions or directions?
Thank you for your time! Paulo
No one can share any ideas, suggestions, ... 8(