Question

Dataset's name in BioMart for S. pombe

1

Entering edit mode

11.4 years ago

Parham ★ 1.6k

Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB?

cheers,

S.pombe BioMart dataset • 5.4k views

ADD COMMENT • link updated 2.1 years ago by YUYANG.OON • 0 • written 11.4 years ago by Parham ★ 1.6k

2

Entering edit mode

11.4 years ago

Devon Ryan 105k

I don't know that it's in Biomart, given that it's not in Ensembl. Just download the GTF or GFF file from pombase and then use makeTranscriptDbFromGFF() from GenomicFeatures.

Edit: I take that back, it is in Ensembl. Here's an example biomart query.

ADD COMMENT • link 11.4 years ago by Devon Ryan 105k

0

Entering edit mode

Yes I saw that, thanks! But does it need to set a lot of parameters? I am new to this field and it is very complex at this point for me, when I check the parameters. Is there a straightforward script for it or should I go all through the arguments and choose carefully?

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.4 years ago by Parham ★ 1.6k

0

Entering edit mode

Do you mean parameters for makeTranscriptDbFromGFF()? It only needs the file name.

ADD REPLY • link 11.4 years ago by Devon Ryan 105k

0

Entering edit mode

Yes because when I checked the ?makeTranscriptDbFromGFF it gives a lot of option. That's why I asked! However when try with the file name only I end up with errors for both GFF3 and GTF format.

> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gff3")
extracting transcript information
Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : 
  No Transcript information found in gff file
> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.21.gtf")
Error in .parse_attrCol(attrCol, file, colnames) : 
  Some attributes do not conform to 'tag=value' format

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.4 years ago by Parham ★ 1.6k

0

Entering edit mode

txdb <- makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gtf", format="gtf") works. I'd have to look into why it doesn't like the gff3 file.

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.4 years ago by Devon Ryan 105k

0

Entering edit mode

Ah, the error with the GFF3 file is due to it not having any mRNA features.

ADD REPLY • link 11.4 years ago by Devon Ryan 105k

Ram · Accepted Answer · 2014-06-12

Looks like you figured out another way of getting what you needed, but, for the record, here is the answer to your question:

S pombe is at http://fungi.ensembl.org/index.html

The biomart is here: http://fungi.ensembl.org/biomart/martview/248a3d2deec76fa7be1e94e32b3972df

Access it using BioConductor's GenomicFeatures as follows. Note the warnings....

library(GenomicFeatures)
library(biomaRt)

txdb<-makeTranscriptDbFromBiomart(
            ,biomart ="fungi_mart_22"
            ,dataset = "spombe_eg_gene"
            ,host="fungi.ensembl.org"
            )

 Download and preprocess the 'transcripts' data frame ... OK
 Download and preprocess the 'splicings' data frame ... OK
 Download and preprocess the 'genes' data frame ... OK
 Prepare the 'metadata' data frame ... OK
 Make the TranscriptDb object ... OK
 Warning messages:
 1: In .normargSplicings(splicings, transcripts_tx_id) :
   no CDS information for this TranscriptDb object
 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) :
   chromosome lengths and circularity flags are not available for this TranscriptDb object

> transcriptsBy(txdb)
 GRangesList of length 7017:
 $SPAC1002.01
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand |     tx_id       tx_name
          <Rle>          <IRanges>  <Rle> | <integer>   <character>
   [1]        I [1798347, 1799015]      + |       510 SPAC1002.01.1

 $SPAC1002.02
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand | tx_id       tx_name
   [1]        I [1799061, 1800053]      + |   511 SPAC1002.02.1

 $SPAC1002.03c
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand | tx_id        tx_name
   [1]        I [1799915, 1803141]      - |  2075 SPAC1002.03c.1

 ...
 <7014 more elements>
 ---
 seqlengths:
         I       II      III       MT      MTR AB325691
        NA       NA       NA       NA       NA       NA