de novo transcriptome
1
0
Entering edit mode
4.5 years ago
valopes ▴ 30

Hi all,

I have 8 RNA-seq samples from the same organism in different conditions. After trimming, filtering quality, filtering possible contaminant species, and rRNA sequences, and et cetera, I've merged this high-quality reads and got a file with 98 million reads. From this, I've run Trinity and got ~4 million contigs. I found it a very high number! Anyway, let's go ahead with my question. I've found some contigs with ~200kb. When I check it for genes I can find something like 20-25 predicted complete genes. I was not expecting this. I mean, I don't know, is it something good or bad?

Thank!

assembly rna-seq • 877 views
ADD COMMENT
0
Entering edit mode

Are you working with a bacteria of virus? yes, that is expected as Operons. If not you maybe are creating chimeras in the assembly, how long are your reads? did you have paired-ends? which seq tech?

ADD REPLY
0
Entering edit mode

Okay! Sorry I miss a lot of info... It is Eukaryote, so not expecting operons. It is 2x150 pb Illumina.

ADD REPLY
0
Entering edit mode
4.5 years ago

4m transcripts is a lot. I guess they are very highly redundant.

Filter - exclude very short ones.

You can reduce redundancy using a fasta clustering tool, eg cd-hit.

Gmap using GFF3 output is excellent for mapping to the genome and visualization.

Once you're checking the alignments visually you'll know how decent the quality is.

Good luck.

ADD COMMENT
0
Entering edit mode

I will follow your suggestions. Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6