Entering edit mode
7.4 years ago
dark_rider_2010
▴
30
I have RNA-seq of an organism which is a combination of some other organism. I don't know the exact genome and all I have is transcriptome of genes. Now I want to recognize the mentioned organism is made of what other organisms. I used Bridger and Trinity to assemble the genes, but there are more that 800 genes in the result file, I tried to use blast to find the similarity between my gene results and ncbi dataset, but the results is not exact and there are lots of results with high similarity. Is there any way I can do that? Thanks
What does it mean
Are you talking about metatranscriptome?
I mean that the new genome is made up of part of other genomes, then the mrna of genes is measured and RNA-seq data is available now. I ued trinity and bridger to assemble genes and what I've got is a file contains 800 genes.
I am sorry, but I cannot really understand. You have an organism (bacterium, yest, human) you take samples and extract the mRNA. Unless there are some contaminations the:
doesn't make sense. If you do not have only one organism but a mix (e.g. a mix of bacteria or human and bacteria) than you have a metatranscriptome. In any way you should have a sort of idea about what you have in your samples.
See also OPs other question: Finding gene names of a fasta file contains gene sequences (which is the same problem I guess)
I suggest you try BBSketch, which will work on the reads or assembly:
or
It will only take a few seconds, and give you a taxonomic breakdown of the species present (provided they are in NCBI's nt database; you can alternately use the flag "refseq" to query RefSeq).
Are the rest contaminants? How come you only have 800 genes? I am not familiar with Bridger (is that somehow responsible for this smaller number).
Because the target genome is made up of only necessary parts of other genomes. No I think the work of bridger is ok (since I tested it with trinity and the results are so close).
Sounds to me like a problem for phylogenetics .. don't know how much it will help you, but have tried clustering by conservation information?
What do you mean by clustering? I don't know anything about the new genomes and conservation information in it.
Then how did you say this above?
Because a bioinformatics program ran and produced an output, it does not make the output always right.
We have over 10+ comments in this thread and it is not still clear about what type of an experiment this is and the rationale you are using for analyzing this data. Your best bet for now is to try @Brian's suggestion.