Question

Trinity de novo assembly without reference genome (community metatranscriptome)

1

Entering edit mode

6.9 years ago

steph_tf ▴ 10

Hi! I need advice on the processing steps for my project. It would be really nice to get some feedback on this. I want to perform Trinity de novo assembly from metatranscriptome samples. Since there are different microorganisms inside, I want to use the big assembly as a reference to map then the individual samples (replicates under 2 different conditions) and count the transcripts for testing differential expression. The problem that I am struggling with is: for the big assembly I will have around or more than 200 million reads (PE), depending on how I will process the sequences (I could have 2 big assemblys, each one for the different conditions, and in this case will be less data; or I could maybe get a better "reference assembly" by using all data together) . So I don't now if it will be possible to perform this without problem on the requested resources using Trinity in a HPC cluster, until now I have only used 40 million as a maximum for assembly and its really difficult to keep running a job for so long. Maybe you could give me some advice on how could I improve the data (pre)processing steps?

Another thing that I'm not sure about is: as I dont have a reference genome and I'm not expecting to have a big percentage of further annotation, It would be better for me to use merged PE (longer reads) that its about 20-30% of my sequences, but in this case I would loose the rest of the information from the unpaired reads AND I should treat my sequences in single mode with Trinity... Is there a way that I could combine my merged data with the unmerged and include everything in my analysis? without having to treat everything as single end data? Thanks in advance!

RNA-Seq metatranscriptome trinity Assembly • 3.7k views

ADD COMMENT • link updated 6.2 years ago by Biostar 20 • written 6.9 years ago by steph_tf ▴ 10

score 4 · Accepted Answer · 2017-12-22

4

Entering edit mode

6.9 years ago

manuelmendoza ▴ 50

You should make only one consensus transcriptome with all your reads. After that, make the differential gene expression, both analyses can be made using Trinity. Overmore, you should use Trinotate for functional annotation. You will need to use a cluster with approx 250G of RAM during 24 hours. Follow the instructions in Trinity GitHub wiki: https://github.com/trinityrnaseq/trinityrnaseq/wiki.

Additionally, you should use Megan to analyze the putative species.

I have the pipeline to do that already so if you need any help contact me.

ADD COMMENT • link 6.9 years ago by manuelmendoza ▴ 50

0

Entering edit mode

Can you share the pipeline?? I am about to start a metatranscriptome . Thank you

ADD REPLY • link 6.9 years ago by popayekid55 ▴ 110

0

Entering edit mode

thank you for your advice! I'm already working on a cluster with Trinity, but still have some issues on the processing and have not deicided yet the final pipeline. How could I contact you?

ADD REPLY • link 6.9 years ago by steph_tf ▴ 10

0

Entering edit mode

could you share your email to be in contact? btw, do you speak spanish?

ADD REPLY • link 6.8 years ago by steph_tf ▴ 10

0

Entering edit mode

I'm so sorry but I don't know how to send a personal message throw this portal... you can contact me by github: manuelsmendoza

ADD REPLY • link 6.7 years ago by manuelmendoza ▴ 50