Question

Find transcripts of one species in RNA-assembly

0

Entering edit mode

4.8 years ago

Tom ▴ 540

Hi everyone,

I have several mRNA-Seq datasets from mixed bacterial communties, so metatranscriptome data. I am only interested in a single species in these communties and would like to find out which transcripts in the community belong to this species.

My approach so far was to do a transcriptome assembly. I then tried using blast to align the assembled transcripts to the a fasta file with reference transcripts (acquired from the annotaed genome).

I am a little skeptical regarding the results. If i combine the length of all the assembled transcripts that produce a hit (harsh e-value cutoff), its much longer than all the transcripts of the organism combined. Is there a possible reason for this (other than the assembly/alignment not working at all)?

Do i have the proper approach for what i am trying to find out, or did i make a mistake along the way? Should i align the reference transcripts to the assembled transcripts or vice-versa?

Thanks in advance for any help!

RNA-Seq blast transcriptome metagenome • 1.2k views

ADD COMMENT • link 4.8 years ago by Tom ▴ 540

1

Entering edit mode

Did you try to use KRAKEN to find them ?

ADD REPLY • link 4.8 years ago by young_bioinformatician ▴ 240

1

Entering edit mode

Thank you for the answer. Maybe that is the best approach. I am building the kraken database right now. Will report back on how well that works out.

ADD REPLY • link 4.8 years ago by Tom ▴ 540

0

Entering edit mode

Would you recommend classifying reads with kraken and then making a transcript assembly from the ones that interest me? Or would i assemble transcripts from all reads and then try to classify those transcripts?

ADD REPLY • link 4.8 years ago by Tom ▴ 540

0

Entering edit mode

I would recommned you that, at least my approach, firstly classify the reads using kraken and then assembled them. After classifying them, whichever species you want to extract from the reads can be filtered based on taxonomy id in the file.

ADD REPLY • link 4.8 years ago by young_bioinformatician ▴ 240