If you have a close relative reference genome, why do you consider necesary to do a de novo assembly? which is the purpose? obtain the transcriptome? or evaluate the assembler performance?
the first one. I would like to obtain the annotated transcriptome of the said organism.
Would BLASTing my transcripts from Trinity (Trinity.fasta) against the genome (genome.fa file) of the close relative would give me good enough results?
If you need the transcriptome and the reference genome is closely-related and well annotated, you can just quantify the transcript expression using StrinTie and also look for new isoforms more than compare assembled transcripts.
mapping the fastq files against the close relative (this gave me good results)
run a de-novo assembly using Trinity to create a transcript.fasta file with the assembled transcripts. this was followed by running bowtie of the fastq files against the indexed transcript.fasta file.
From both runs i got a bam file, which is needed by StringTie
If I run StringTie against the first bam file, I won't have the annotations I need, but running StringTie with the second bam file - I don't have an annotation file (gtf), so I am not sure if this make sense.
Can I use as input the bam file from the de-novo assembly, but takes the annotation file from the close relative?
Run Stringtie with the bam of step 1 and the GENOME annotation (GTF, GFF or GFF3), you do not need an extra annotation.
Bam file contain the coordinates where each read is aligned to the genome, stringtie just count how many reads align to each annotated gene and calculates relative expression (TPM and FPKM).
You can also count the mapped reads (to each annotated gene of bam1 ) with htseq-count and then calculate relative expression.
But I'm not primarily interested in counting the expression. This is only a secondary results of the analysis.
I am more interested in creating an annotation file for my genome, which has no annotations.
I would like to try comparative genomics which can assign/predict functionality to my transcripts( I was thinking something like via BLAST or other ORF-reading comparison tools).
you can also use stringtie or cufflinks to annotate your transcripts to the genome. To assign potential functionality to your transcripts you can use blast, but I recommend you Blast2GO, it is more easy to handle and can perform exactly you are looking for.
What I used to do when I did a lot of this type of transcriptome assembly was the following
map trinity results (many different parameters, or clustered, or various organs) against the genome with gmap, using the very nice GFF3 out out option.
Manually compare (or get biologists to compare transcripts and regions of interest, even better) the different assemblies. You can get an impression very fast of which sets of results look best.
Use transdecoder to get sets of CDS, amino acids etc from the trinity assembly.
Worked pretty well. Functionally annotating the FASTA outputs of transdecoder was always highly compute intensive ....
You might also (re)annotate the genome using Maker with the evidence from the Trinity assemblies and Transdecoder steps.
Also, providing results iteratively to your collaborators via eg a local JBrowse will allow you to improve the transcripts and provide versioning.
Thanks for the suggestion. I was already planning on running either StringTie or gmap. But just for clarifications - do you mean using the results of the trinity run (e.g. Trinity.fasta) to map against the indexed genome of the close relative?
From memory that looks reasonable. You might play around with the -n parameter to exclude junk too.
Making sense out if requires your eyeballing after import into a genome browser. That's why I mentioned JBrowse, which is excellent for comparing multiple tracks. You can import the GFF3 and use the server or standalone version.
Of course, you'll also need to import the GTF of your close relative too for comparison.
Hopefully that will allow you to see if your assembly is overly fragmented or reasonable.
If you have a close relative reference genome, why do you consider necesary to do a de novo assembly? which is the purpose? obtain the transcriptome? or evaluate the assembler performance?
the first one. I would like to obtain the annotated transcriptome of the said organism.
Would BLASTing my transcripts from Trinity (
Trinity.fasta
) against the genome (genome.fa
file) of the close relative would give me good enough results?If you need the transcriptome and the reference genome is closely-related and well annotated, you can just quantify the transcript expression using StrinTie and also look for new isoforms more than compare assembled transcripts.
I have already done two things.
transcript.fasta
file with the assembled transcripts. this was followed by running bowtie of the fastq files against the indexedtranscript.fasta
file.From both runs i got a bam file, which is needed by StringTie If I run StringTie against the first bam file, I won't have the annotations I need, but running StringTie with the second bam file - I don't have an annotation file (gtf), so I am not sure if this make sense. Can I use as input the bam file from the de-novo assembly, but takes the annotation file from the close relative?
Run Stringtie with the bam of step 1 and the GENOME annotation (GTF, GFF or GFF3), you do not need an extra annotation. Bam file contain the coordinates where each read is aligned to the genome, stringtie just count how many reads align to each annotated gene and calculates relative expression (TPM and FPKM). You can also count the mapped reads (to each annotated gene of bam1 ) with htseq-count and then calculate relative expression.
But I'm not primarily interested in counting the expression. This is only a secondary results of the analysis.
I am more interested in creating an annotation file for my genome, which has no annotations. I would like to try comparative genomics which can assign/predict functionality to my transcripts( I was thinking something like via BLAST or other ORF-reading comparison tools).
you can also use stringtie or cufflinks to annotate your transcripts to the genome. To assign potential functionality to your transcripts you can use blast, but I recommend you Blast2GO, it is more easy to handle and can perform exactly you are looking for.