Dear colleagues, I have a bioinformatic task that is quite a challenge for me: to calculate the expression of several target genes using RNA-seq data from SRA studies.
I've found some helpful comments on this issue,
data from SRA against blast databases method.
Blast Against Sra Dataset
My plan is to use coding sequences of several orthologous target genes, and blast them directly against SRA datasets (which contain unaligned short reads of 150-250bp length from the same organism). Then select all matching short reads and map them on target genes (with bowtie2, for instance). Is this algorithm credible for estimates of gene expression, what hurdles may occur if applying it? How to filter results of a blast search and what settings should be applied for mapping reads?
It is possible to build the whole transcriptome for this matter, but it's going to be much more demanding in terms of time and computational power.
Thanks in advance
Have you considered just mapping the datasets against the gene sequences and calling it done? I think hisat and hisat2 can even directly accept SRA urls.
I will second Devon's comment. Just get the data and process it. If you know that you are going to be doing gene expression, you could even use pseudomapping approaches like salmon or kallisto which can run with the resources of a laptop.
Thank you, Devon and Sean, weighing all pros and cons I've decided to follow your advise and map whole datasets against my genes.
I haven't heard about hisat before, but this tool looks promising and if it can accept SRA urls - it is just what I need! Pseudomapping approaches are also of interest. I will run both and compare results.