Need guidance for RNA_seq analysis of Ecoli with expression vector
1
0
Entering edit mode
4.0 years ago
Morgan S. ▴ 90

Hi!

I am working on RNAseq data from E. coli k12 substr. MG1655, which has an added expression vector. I am interested in the expression of the genes within this vector. But of course, when I align the transcripts to the reference genome (which does not have the vector), the expression vector genes I am interested in are unmapped.

So to get a better reference, I de novo assembled the transcriptome of my control sample (with rnaSPAdes) so that the vector would be included in the assembly. The assembly stats were quite bad, N50 = 2315 bp and the de novo transcriptome was ~500,000 bp larger than the original reference Ecoli MG1655. I then used Scaffold_Builder to try to improve the transcriptome with the MG1655 reference genome, which resulted in a better N50 of 780113 bp but now 1.4 mill bp larger than the reference.

I considered trying Trinity's genome-guided option, but unmapped reads do not get included, defeating the purpose of what I am trying to do.

Can someone please provide some suggestions on how to further improve and refine my new reference transcriptome? I want to be sure that the reference is of good enough quality for my downstream expression analysis. How can I be sure that it is? Of course, I am hoping to do this without further sequencing if possible.

Thanks in advance!

RNA-Seq bacteria transcriptome Assembly alignment • 962 views
ADD COMMENT
2
Entering edit mode
4.0 years ago
ATpoint 86k

How about just adding the vector as an additional contain to the reference genome? That is by far the simplest solution. Use can use cat the sequence to the fast file you align against, but obviously would need to make a new index from the "new" reference genome.

ADD COMMENT
1
Entering edit mode

Nevermind, I found this post also mentioning what you suggested. A: Quantification of a gene that is not in the reference genome

Thank you!!

ADD REPLY
0
Entering edit mode

Wow, I was making this so much harder than need be. So if the vector contains three genes I am interested in, I could cat only those three into the reference fasta?

ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6