Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart
to create TranscriptDB?
cheers,
Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart
to create TranscriptDB?
cheers,
Looks like you figured out another way of getting what you needed, but, for the record, here is the answer to your question:
S pombe is at http://fungi.ensembl.org/index.html
The biomart is here: http://fungi.ensembl.org/biomart/martview/248a3d2deec76fa7be1e94e32b3972df
Access it using BioConductor's GenomicFeatures as follows. Note the warnings....
library(GenomicFeatures)
library(biomaRt)
txdb<-makeTranscriptDbFromBiomart(
,biomart ="fungi_mart_22"
,dataset = "spombe_eg_gene"
,host="fungi.ensembl.org"
)
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
Warning messages:
1: In .normargSplicings(splicings, transcripts_tx_id) :
no CDS information for this TranscriptDb object
2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) :
chromosome lengths and circularity flags are not available for this TranscriptDb object
> transcriptsBy(txdb)
GRangesList of length 7017:
$SPAC1002.01
GRanges with 1 range and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] I [1798347, 1799015] + | 510 SPAC1002.01.1
$SPAC1002.02
GRanges with 1 range and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
[1] I [1799061, 1800053] + | 511 SPAC1002.02.1
$SPAC1002.03c
GRanges with 1 range and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
[1] I [1799915, 1803141] - | 2075 SPAC1002.03c.1
...
<7014 more elements>
---
seqlengths:
I II III MT MTR AB325691
NA NA NA NA NA NA
I don't know that it's in Biomart, given that it's not in Ensembl. Just download the GTF or GFF file from pombase and then use makeTranscriptDbFromGFF()
from GenomicFeatures.
Edit: I take that back, it is in Ensembl. Here's an example biomart query.
Yes I saw that, thanks! But does it need to set a lot of parameters? I am new to this field and it is very complex at this point for me, when I check the parameters. Is there a straightforward script for it or should I go all through the arguments and choose carefully?
Yes because when I checked the ?makeTranscriptDbFromGFF
it gives a lot of option. That's why I asked! However when try with the file name only I end up with errors for both GFF3 and GTF format.
> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gff3")
extracting transcript information
Error in .prepareGFF3TXS(data, useGenesAsTranscripts) :
No Transcript information found in gff file
> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.21.gtf")
Error in .parse_attrCol(attrCol, file, colnames) :
Some attributes do not conform to 'tag=value' format
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It's strange, yesterday I tried these commands and it built the TranscriptDB but today I am receiving an error! Do you see any problem?
Try specifying the mart as:
Works with
useMart(biomart="fungi_mart", dataset="spombe_eg_gene", host="https://fungi.ensembl.org")