Dear All,
I have been using the sequences and annotations for mouse from tophat website here and specifically the UCSC version mm10.
I see that the sequences (genome.fa) contains Chr1- Chr19, ChrX, ChrY and ChrM
... while the corresponding GTF file contains features for these random chromosome sequences such as Chr4_JH584294_random
.
I was trying to calculate rpkm
values for my alignment (tophat bam files
that used this genome sequences) using the GTF.
library(IRanges)
library(Rsamtools)
library(GenomicFeatures)
library(GenomicRanges)
aligns<- readBamGappedAlignments(filepath\accepted_hits.bam)`
txdb<-makeTranscriptDbFromGFF("genes.gtf", format="gtf",species="Mus musculus",dataSource="http://tophat.cbcb.umd.edu/igenomes.shtml")
exonRanges.gene<-exonsBy(txdb,"gene")`
seqlevels(aligns)
[1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6"
[17] "chr7" "chr8" "chr9" "chrM" "chrX" "chrY"
seqlevels(exonRanges.gene)
[1] "chr13" "chr9" "chr6" "chrX" "chr17"
[6] "chr2" "chr7" "chr18" "chr8" "chr4"
[11] "chr19" "chr5" "chr16" "chr11" "chr10"
[16] "chr14" "chr1" "chr3" "chr15" "chr12"
[21] "chrY" "chrX_GL456233_random" "chr5_JH584299_random" "chr5_JH584298_random" "chr4_GL456216_random"
[26] "chr4_GL456350_random" "chr4_JH584294_random" "chr4_JH584293_random" "chr5_GL456354_random" "chr7_GL456219_random"
[31] "chr5_JH584296_random" "chr5_JH584297_random" "chr4_JH584292_random" "chr1_GL456221_random" "chrUn_JH584304"
Because they have different Chromosomes (extra chrN_XXXXX_random
), I get the following error/warnings.
1: In .deduceExonRankings(exs, format = "gtf") :
Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
2: In matchCircularity(chroms, circ_seqs) :
None of the strings in your circ_seqs argument match your seqnames.
3: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrX_GL456233_random, chr5_JH584299_random, chr5_JH584298_random, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584294_random, chr4_JH584293_random, chr5_GL456354_random, chr7_GL456219_random, chr5_JH584296_random, chr5_JH584297_random, chr4_JH584292_random, chr1_GL456221_random, chrUn_JH584304
- in 'y': chrM
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
I understand that this is due to conflict in Chromosomes and I can handle this by deleting the ChrN_XXXXXX_random
features from .gtf
file or by ` aligning to genome with these sequences. Is there any advantages or disadvantages (right/wrong) for the former or later.
Any other work around for this? additionally, I would like to know, if there is any genome/gtf build that can be downloaded which is coherent in these respects for mouse, rat and human genome. Thanks in advance!!
Thanks!! That sounds promising.. I should nevertheless try Ensembl Annotations!!