Dear All,
I am working a plant species whose draft genome sequence is available. I have done de novo and genome-guided assembly separately, please suggest how can I combined these both assemblies to generate a reference transcriptome sequences for raw read count.
Should I do blastn between them to remove overlap sequences or somthing else? should I further assembled both transcriptome assembly?
Thanks
Transcriptome? You can make hybrid assemblies using Trinity, IDBA_hybrid, or even spades from your both attemps, or using draft genome as trusted reference.
Yes, It is RNAseq data.
You want to assemble RNAseq reads? Why?
Actually draft genome sequences is available. It do not contain all genes sequences. So I have done genome-guided assembly using StringTie and de novo assembly using Trinity.
Now to make complete reference genes sequences for raw read count, I want to remove overlapping gene sequences between these two assemblies so that I can have non-redundant genes sequences.
Why have you assembled RNAseq reads? What are you trying to do? All the short-read sequencing in the world is never going to allow you to close a genome and get a completed sequence.
If you want to get raw read counts from your RNAseq you should be mapping the reads (e.g. with
bwa
,bowtie2
etc) to the existing reference (or a reassembly if you have access to the original sequencing) and then calculating the raw read counts from the alignment map, not assembling.Thanks, I am not expecting to get all genes sequences, but atleast to retrieve those genes transcript which are present in my transcriptome data and may have imp. functional role but do not present in currently reported CDS sequences.
So you are looking for untranslated genome features? sRNAs, pseudogenes etc?
I am looking for protein coding gene sequences which are not present in the current annotated CDS. Yes, I want to remove redundancy and want to select the longest transcript from both genome guided and de novo assembled transcripts.
I'm genuinely curious, what is the problem with assembling RNA-seq reads?
They can be assembled potentially, they are just short reads afterall - but why would you want to? If you have a region of no transcription, you won't reverse transcribe any cDNA to be sequenced from that region of the genome in the library prep. If there are no reads there, then the assembler will have to terminate the contig there as there will be no more read sequences to overlap. Even in the best case, you'll have very different coverage at intergenic and genic regions. You'd end up with an assembly but it would probably be full of short contigs so it'd be pretty shitty.
That sounds like an issue with a genome assembly from RNA-seq reads, which I would totally support would be bonkers. But a transcriptome assembly (which seems to be what OP wants to do) should be fine with RNA-seq, no?
When you say 'assembling a transcriptome' though, what exactly do you mean? Maybe it's just a syntactic difference, because when I hear the word 'assembly' I take that to mean literally using an assembler. If you want to do transcriptomics from RNAseq via mapping though, then sure! I think we possibly just misunderstand one another when we're using the word 'assemble' in the context of RNAseq/transcriptomics.
In this case I was assuming de novo, e.g. using Trinity, Velvet/Oases, IDBAtran etc. So, yes literally using an assembler.
I've never done it personally, as I've always had a reference genome to map against so I'm not sure of the use case. Particularly in this case as the OP said there was a reference for the organism too.
It's possible I'm misunderstanding the question as I've personally never come accross the need to do transcriptome assembly (instead of just mapping etc to a reference)
I think it's simply complementary, especially if the genomic reference is missing, incomplete, or otherwise of bad quality. In my field we often don't have a reference genome, so no other choice. I was just asking, because your earlier reply sounded to me as if that is something completely out of the question and I wanted to know why.
Sorry can you be more specific? I think you have not clear what you have, and what you want.