Dear All,
Hello, as I am not an expert I would like your help with my problem.....
I have used tophat to allign a fastq from RNA-seq experiment and then I wanted to make a count table to use it in DESeq. Hence I tried to use easyRNAseq for that purpose but I have a problem.....
here is my code....
library(easyRNASeq)
library(BSgenome.Mmusculus.UCSC.mm9)
dataDir <- "/data/lena/m"
file.exists(dataDir)
stopifnot(file.exists(dataDir))
list.files(dataDir)
gtf <- "/home/lena/mm9_IlluminaAnnotation_genes.gtf"
rna <- "accepted_hits.bam"
c_ers <- easyRNASeq(organism = "Mmusculus",
chr.sizes = as.list(seqlengths(Mmusculus)),
annotationMethod = "gtf", annotationFile= gtf,
format = "bam", count = "genes",
summarization = "geneModels",outputFormat="DESeq",
filenames = rna, filesDirectory = dataDir
)
and here is what i get
Checking arguments...
Fetching annotations...
Read 566539 records
Error in .getGtfRange(organismName(obj), filename = filename, ignoreWarnings = ignoreWarnings, :
Your gtf file: /home/lena/mm9_IlluminaAnnotation_genes.gtf does not contain all the required fields: gene_id, transcript_id, exon_number, gene_name .
In addition: Warning messages:
1: The use of the list for providing chromosome sizes has been deprecated. Use a named numeric vector instead.
2: In .Method(..., deparse.level = deparse.level) :
number of columns of result is not a multiple of vector length (arg 17)
Hence what am I doing wrong? What should I do ??? Could you please advise me?
This is how my Gtf file looks like
chr1 unknown exon 3204563 3207049 . - . gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";
chr1 unknown stop_codon 3206103 3206105 . - . gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";
chr1 unknown CDS 3206106 3207049 . - 2 gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";
Also I tried to use the biomart way....
so that is what I did.....
ensembl=useMart(host='may2009.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset='mmusculus_gene_ensembl')
ensembl.genes <- getBM(attributes = c('chromosome_name','start_position'
,'end_position','ensembl_gene_id','external_gene_id','strand'), mart = ensembl)
c_ers <- easyRNASeq(organism = "Mmusculus",
chr.sizes = as.list(seqlengths(Mmusculus)),
annotationMethod = "biomaRt", annotationFile= ensembl.genes,
format = "bam", count = "genes",
summarization = "geneModels",outputFormat="DESeq",
filenames = rna, filesDirectory = dataDir
)
Checking arguments...
Fetching annotations...
Error in easyRNASeq(organism = "Mmusculus", chr.sizes = as.list(seqlengths(Mmusculus)), :
The number of conditions: 0 did not correspond to the number of samples: 1
In addition: Warning message:
The use of the list for providing chromosome sizes has been deprecated. Use a named numeric vector instead.
Could you please help me....I am stucked and don't know what to do.
Thank you in advance!
I will appreciate your answers
Best regards Lena