I have pacbio Isoseq (.ccs.bam) files for 18 samples (.ccs.bam) and I want to perform their differential expression analysis to identify the genes of our interest.
I am familiar with Differential expression analysis of RNA-seq data and I am looking similar pipelines for Isoseq data, Not sure i am on right way!!
I explored several articles to figure out the feasible and correct way to DE analysis of the Pacbio isoseq but observed 2 similar, but tappas NEEDS short reads as well along with isoseq for the DE analysis-
Thanks for the link. It appears to me similar to Isoseq analysis using PAcbio smrtlink. I already have done Isoform annotation i.e. classification and filtering of ccs reads (with pacbio smrtlink including pigeon) of pacbio isoseq reads for genome annotation.
I want to perform Differential gene expression (DGE) analysis using Isoseq data i.e., pacbio (ccs) long reads similar to DEG analysis using Illumina short reads using DESeq2 or EdgeR.
Is there any way to perform DGE analysis of long reads? I can go upto the mapping using pbmm2 or minimap2. How can we do gene counting of aligned isoseq reads?
I'd also like to know if anyone has advice for this. I've had three ideas:
The most simple idea would be to group everything by gene in one of the
classification files (either collapsed_classification.txt or
filtered_lite_classification.txt)
Another possible option would be to align the samples on the flnc.bam so the data is filtered but there's still one read per transcript, then run a feature counter as normal.
The final option would be to use one of the tools they suggested. I was planning on using tappas and will look for a workaround because that seems like a huge oversight. If it doesn't work using DRIMseq or possibly even DESeq2 might be good options.
"While the output of the Iso-seq pipeline is great for characterising samples, you’re right to identify a bit of a gap here. A number of researchers like you want to do differential expression analysis and there isn’t really a well-established path at the moment.
In a followup they also said that they know it's a need, but the bioinformatics need to catch up as until recently the tech tended to use it for things like de-novo assembly and it wasn't available in the quantity (presumably read depth) required for differential expression etc.
I made a way to perform the deseq2 analysis of pacbio reads- Since the reads in our samples had barcodes, so i had to demux them first. For that i followed some filtering tools from PAISOseq (https://www.nature.com/articles/s41467-019-13228-9) pipe line and then aligned the reads using minimap2, and input mapped reads into featureCount i.e. subread. Finally, i fed the featurecount output file into deseq2 and hence accomplished it.
nf-core
has a pipeline available: https://nf-co.re/isoseq/1.1.5Thanks for the link. It appears to me similar to Isoseq analysis using PAcbio smrtlink. I already have done Isoform annotation i.e. classification and filtering of ccs reads (with pacbio smrtlink including pigeon) of pacbio isoseq reads for genome annotation.
I want to perform Differential gene expression (DGE) analysis using Isoseq data i.e., pacbio (ccs) long reads similar to DEG analysis using Illumina short reads using DESeq2 or EdgeR.
Is there any way to perform DGE analysis of long reads? I can go upto the mapping using pbmm2 or minimap2. How can we do gene counting of aligned isoseq reads?
Thanks
I'd also like to know if anyone has advice for this. I've had three ideas:
The most simple idea would be to group everything by gene in one of the classification files (either collapsed_classification.txt or filtered_lite_classification.txt)
Another possible option would be to align the samples on the flnc.bam so the data is filtered but there's still one read per transcript, then run a feature counter as normal.
The final option would be to use one of the tools they suggested. I was planning on using tappas and will look for a workaround because that seems like a huge oversight. If it doesn't work using DRIMseq or possibly even DESeq2 might be good options.