Dataset's name in BioMart for S. pombe
2
1
Entering edit mode
10.5 years ago
Parham ★ 1.6k

Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB?

cheers,

S.pombe BioMart dataset • 4.4k views
ADD COMMENT
4
Entering edit mode
10.5 years ago
Malcolm.Cook ★ 1.5k

Looks like you figured out another way of getting what you needed, but, for the record, here is the answer to your question:

S pombe is at http://fungi.ensembl.org/index.html

The biomart is here: http://fungi.ensembl.org/biomart/martview/248a3d2deec76fa7be1e94e32b3972df

Access it using BioConductor's GenomicFeatures as follows. Note the warnings....

library(GenomicFeatures)
library(biomaRt)

txdb<-makeTranscriptDbFromBiomart(
            ,biomart ="fungi_mart_22"
            ,dataset = "spombe_eg_gene"
            ,host="fungi.ensembl.org"
            )

 Download and preprocess the 'transcripts' data frame ... OK
 Download and preprocess the 'splicings' data frame ... OK
 Download and preprocess the 'genes' data frame ... OK
 Prepare the 'metadata' data frame ... OK
 Make the TranscriptDb object ... OK
 Warning messages:
 1: In .normargSplicings(splicings, transcripts_tx_id) :
   no CDS information for this TranscriptDb object
 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) :
   chromosome lengths and circularity flags are not available for this TranscriptDb object

> transcriptsBy(txdb)
 GRangesList of length 7017:
 $SPAC1002.01
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand |     tx_id       tx_name
          <Rle>          <IRanges>  <Rle> | <integer>   <character>
   [1]        I [1798347, 1799015]      + |       510 SPAC1002.01.1

 $SPAC1002.02
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand | tx_id       tx_name
   [1]        I [1799061, 1800053]      + |   511 SPAC1002.02.1

 $SPAC1002.03c
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand | tx_id        tx_name
   [1]        I [1799915, 1803141]      - |  2075 SPAC1002.03c.1

 ...
 <7014 more elements>
 ---
 seqlengths:
         I       II      III       MT      MTR AB325691
        NA       NA       NA       NA       NA       NA
ADD COMMENT
0
Entering edit mode

It's strange, yesterday I tried these commands and it built the TranscriptDB but today I am receiving an error! Do you see any problem?

> txdb<-makeTranscriptDbFromBiomart(biomart="fungi_mart_22", dataset="spombe_eg_gene", host="fungi.ensembl.org")
Error in useDataset(mart = mart, dataset = dataset, verbose = verbose) : 
  The given dataset:  spombe_eg_gene , is not valid.  Correct dataset names can be obtained with the listDatasets function.
ADD REPLY
1
Entering edit mode

Try specifying the mart as:

biomart="fungal_mart"
ADD REPLY
0
Entering edit mode

Works with useMart(biomart="fungi_mart", dataset="spombe_eg_gene", host="https://fungi.ensembl.org")

ADD REPLY
2
Entering edit mode
10.5 years ago

I don't know that it's in Biomart, given that it's not in Ensembl. Just download the GTF or GFF file from pombase and then use makeTranscriptDbFromGFF() from GenomicFeatures.

Edit: I take that back, it is in Ensembl. Here's an example biomart query.

ADD COMMENT
0
Entering edit mode

Yes I saw that, thanks! But does it need to set a lot of parameters? I am new to this field and it is very complex at this point for me, when I check the parameters. Is there a straightforward script for it or should I go all through the arguments and choose carefully?

ADD REPLY
0
Entering edit mode

Do you mean parameters for makeTranscriptDbFromGFF()? It only needs the file name.

ADD REPLY
0
Entering edit mode

Yes because when I checked the ?makeTranscriptDbFromGFF it gives a lot of option. That's why I asked! However when try with the file name only I end up with errors for both GFF3 and GTF format.

> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gff3")
extracting transcript information
Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : 
  No Transcript information found in gff file
> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.21.gtf")
Error in .parse_attrCol(attrCol, file, colnames) : 
  Some attributes do not conform to 'tag=value' format
ADD REPLY
0
Entering edit mode

txdb <- makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gtf", format="gtf") works. I'd have to look into why it doesn't like the gff3 file.

ADD REPLY
0
Entering edit mode

Ah, the error with the GFF3 file is due to it not having any mRNA features.

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6