too many transcripts after trinity
0
0
Entering edit mode
5.4 years ago

Dear all,

I have RNA-SEQ data for a species, it's relative species has a genome, about 20,000 genes. But we need to use Trinity assembly for our species. I have 7 time point, 2 replicates for each time point. After Trinity de novo assembly, I got about 1319212 transcript sequence, then I use CD-HIT to remove redundancy,and "get_longest_isoform_seq_per_trinity_gene.pl" to get longest isoform, but it still about 650,000 transcript sequences. I thought this species should only have about 40,000 - 60,000 transcripts, but there are too much more in reality.

If any one know what's the problem with that?

Thanks!

RNA-Seq next-gen • 2.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you for reply! And I look through some papers, seems my relevant species also have many contigs in their paper. But the "contigs" they refer is the same meaning with transcripts, right? (just the sequences number in fasta file) :)

Thank you!

ADD REPLY
0
Entering edit mode

I do not know about your specific project but in general contigs are from genome assemblies and transcripts are from transcriptome assemblies. These are two different things: - In genome assembly your goal is to create a complete representation of the genome (Ideally with a fasta containing one sequence per chromosome) - The transcriptome assembly aims at generating the whole set of transcripts (the fasta will contain many sequences, corresponding to the transcripts and their different isoforms).

ADD REPLY
0
Entering edit mode

I got it! Thank you!!

ADD REPLY

Login before adding your answer.

Traffic: 2563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6