Hi All,
Is there a R package that I can use to get the biological annotations?
I am using NOISeq package to analyze RNASeq data, and I am interested in adding additional biological annotations such as:
- feature length
- GC content
- biological classification features
- chromosome position
Question: Is it possible to direct me to how to get these information ? Can you let me know which packages that I can use to get the information.
Thank you very much.
Thank you I was able to use
biomaRt
package to grab the required sequences using:Which gave me:
But the gene start and end positions as well as length of gene is well away from what it should be as shown below: (where i am following the NOISeq tutorial)
You are using GRCh38, NOISeq tutorial GRCh37?
it is possible, but in
NOISeq
it does not say which it is been using.But again, the difference is huge, so it is obvious that I am missing something in calculating the gene-length.
I also tried to calculate the mean, median and sum of transcript length, and which was way off than the number they provided in NOISeq
I think your tutorial is very old. Add "version=75" to your useMart command, and you'll see gene coordinates close to the tutorials.
As a side note, you calculate the total length of the gene (exons and introns). I guess NOISeq is using only sum of exon length.
Thanks @swbarnes2 and @b.nota
I was able to replicate the NOISeq data length using a older version of mart, and also using transcript start and end locations to calculate the length.
So my final code would be:
which gave me:
So I can take this as an answer.
As a side note: Which is off-set by 1 in each case, to the NOISeq data set: ( why ?)
The off-set of one is probably because of the one-based coordinate system used by ensembl. Add
+ 1
to your equation.What are the three most common sources of programming errors? Poor naming, and 1-off errors.