Hi there,
I am working on an RNA seq project with non model organisms, where I am hoping to discover the differential expression of a specific set of genes and/or sequence variants of the genes. I am having issue with my workflow though and find myself going in circles as I am not sure how to call the specific gene targets or sequences from my assembly.
Workflow so far: 1. FastQC 2. Trimmomatic 3. FastQC (to verify trimming) 4. Trinity 5. Quality assessment with BUSCO and other methods 6. Salmon (through Trinity) from here I plan on using DESeq2
Also I am using TransDecoder and Trinotate to annotate my assembly (with Blastp, Blastx, and Pfam)
Is there something I may be missing, overlooking, or simply just over-complicating? Is it possible to pull target genes from my assembly and blast them?
Hi there, I am not completely familiar with your pipeline - but I have done a de novo transcriptome assembly on a non model organism using stringtie and did DE with DESeq2. As long as you have an annotated genome you are using for your reference and you know what the geneIDs for that reference are - you should be able to pull them out of the DESeq2 list. Are you expecting transcripts that were not identified in your reference genome?
The annotation I will be using is the one I have generated through Trinotate. The only reference genome available, that I know of, that I could use is still highly diverged from the species I am focusing on, as well as there is no (functional) annotation for the reference genome. I am focusing on essential genes that should be expected to be present in the reference genome.
If you annotated your assembly already with blast then I don't think you need to do it again - however, you can use stringtie to make gtf files from your Salmon BAM output and these should contain the geneIDs annotated from your assembly if you use it as your reference gtf/gff file. From this, if you have a list of genes you are interested in you can search for them in your stringtie files or after you do DESeq2 you can subset with a list of the genes your interested in. If you want to look more closely at the BAM files and the specific sequences - I would recommend converting to fasta files using sam tools, and then you can blast the fasta files: samtools fasta input.bam > output.fasta. Hope this helps.