Quantification of a gene that is not in the reference genome
1
1
Entering edit mode
7.1 years ago
felipead6 ▴ 10

I work with mouse genome and I have RNA-Seq data from a mouse strain. When I align the reads in the mouse genome, a gene is not quantified since this gene does not appear in the reference mouse genome. How can I quantify this gene?

RNA-Seq sequencing • 2.2k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
Jake Warner ▴ 840

Manually add it to the genome as its own contig and re-align!
Edit: To be a little more explicit, you would append your reference fasta with the gene sequence of interest using something like cat musmus.fa newgene.fa > mus_edited.fa then add an entry in the GFF file for the gene, then re-build the reference index and re-align your reads.

ADD COMMENT
0
Entering edit mode

You don't have to realign. Just add the gene of interest informations in the gtf file and rerun featurecounts (to count reads per gene) and then use edgeR or DESeq2.

ADD REPLY
0
Entering edit mode

That sounds interesting. How do I add the gene information in the .gtf file considering that I have the position of the gene available?

ADD REPLY
0
Entering edit mode

look at : http://www.ensembl.org/info/website/upload/gff.html for gtf column description, and add corresponding informations to the gtf (so open it in a text editor and edit it).

ADD REPLY
0
Entering edit mode

just open the gtf with gedit or excel and fill

ADD REPLY
0
Entering edit mode

If the sequence is present and the annotation doesn't exist then this is indeed the correct approach!

ADD REPLY
0
Entering edit mode

I have the .gb file from genbank, is there any way to transform this to .gtf?

ADD REPLY
1
Entering edit mode

That part is easily googled:

e.g:

https://github.com/riverlee/genbank2gtf

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6