Annotations with biomart for ballgown
1
0
Entering edit mode
5.0 years ago
schlogl ▴ 160

Hi guys sorry I am here again, but I try for sure to look for some answer before I come here bother you all. I following a tutorial to learn to work with R and some RNA-Seq data and time to time I have to face a different problem. Until now I have being dealing with Biomart annotation and got some error to find the exact attributes and stuff, but I got everything ok looking for at google.

But at the last part of the annotation I got to face that I didn't get any annotated genes.

This is the head of the merged gtf file that I got with stringtie2:

Chr1    StringTie   transcript  3631    5899    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; gene_name "AT1G01010.TAIR10"; xloc "XLOC_000001"; ref_gene_id "AT1G01010.TAIR10"; cmp_ref "AT1G01010.1.TAIR10"; class_code "="; tss_id "TSS1";
Chr1    StringTie   exon    3631    3913    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "1";
Chr1    StringTie   exon    3996    4276    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "2";
Chr1    StringTie   exon    4486    4605    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "3";
Chr1    StringTie   exon    4706    5095    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "4";
Chr1    StringTie   exon    5174    5326    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "5";
Chr1    StringTie   exon    5439    5899    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "6";

And it looks very similar with the file from the guys that put out the tutorial. But at the and I got none annotated genes.

g_sign<-subset(results_genes_no_filter,pval<0.01 & abs(log2fc)>0.584)
head(g_sign)

> head(g_sign)
     feature               id           fc         pval      qval    log2fc
125     gene AT1G04610.TAIR10 5.611861e+01 0.0001284833 0.5662515  5.810407
128     gene AT1G04660.TAIR10 3.523963e+01 0.0021592230 0.5662515  5.139127
205     gene AT1G07450.TAIR10 2.100165e+00 0.0021455935 0.5662515  1.070503
220     gene AT1G07985.TAIR10 1.037975e+02 0.0002824740 0.5662515  6.697627
676     gene AT1G21830.TAIR10 4.241355e+00 0.0024521123 0.5662515  2.084525
1003    gene AT1G31360.TAIR10 4.043344e+05 0.0057775911 0.5662515 18.625189

I got this well after looking for some adjusts

    #### ANNOTATION - BioMart ########

    listMarts(host="plants.ensembl.org")

    mart=useMart("plants_mart", host="plants.ensembl.org")
    head(listDatasets(mart))[grep("thaliana",listDatasets(mart)[,1]),]

    mart <- useMart("plants_mart", dataset="athaliana_eg_gene", host="plants.ensembl.org")
    listFilters(mart)
    head(listFilters(mart),20)
    listFilters(mart)[grep("ensembl",listFilters(mart)[,1]),]
    searchAttributes(mart = mart, pattern = "ensembl_gene_id")
    listAttributes(mart = mart, page="feature_page")
getBM(attributes=c("ensembl_gene_id”,”ensembl_transcript_id”,”ensembl_peptide_id”,"ensembl_exon_id" ,”description"),mart=thale_mart)
    head(thale_data_frame)
    tail(thale_data_frame)

    ## Now match the genes from our list to this dataset
    annotated_genes = subset(thale_data_frame, ensembl_gene_id %in% g_sign$id)
    dim(annotated_genes )
[1] 0 3

There are anyway to fix it?

I follow the steps and had similar results, however the last thing that I could't fix was that my Biomart package (2.40.5) is old for my R version, but I tried to update without good exit.

> version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes

Any suggestions or directions?

Thank you for your time! Paulo

rna-seq R • 1.1k views
ADD COMMENT
0
Entering edit mode

No one can share any ideas, suggestions, ... 8(

ADD REPLY
1
Entering edit mode
4.9 years ago
schlogl ▴ 160

For who needs futures answers to the same question.

https://github.com/wolf-adam-eily/rnaseq_for_model_plant#Fourth_Point_Header

ADD COMMENT

Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6