Question

Differential expression Analysis of PacBio Isoseq data

1

Entering edit mode

10 months ago

tvibhaps ▴ 10

Hi,

I have pacbio Isoseq (.ccs.bam) files for 18 samples (.ccs.bam) and I want to perform their differential expression analysis to identify the genes of our interest.

I am familiar with Differential expression analysis of RNA-seq data and I am looking similar pipelines for Isoseq data, Not sure i am on right way!!

I explored several articles to figure out the feasible and correct way to DE analysis of the Pacbio isoseq but observed 2 similar, but tappas NEEDS short reads as well along with isoseq for the DE analysis-

https://www.pacb.com/wp-content/uploads/Application-note-Bioinformatics-tools-for-full-length-isoform-sequencing.pdf https://tappas.org/

Please suggest if any DE workflow for ISoSeq (without using short reads) data exists similar to RNA-seq or else ?

I will highly appreciate the help and time. Thanks

Vibha

Differential-Expression-Analysis ISOseq • 2.2k views

ADD COMMENT • link 8 months ago by tvibhaps ▴ 10

0

Entering edit mode

nf-core has a pipeline available: https://nf-co.re/isoseq/1.1.5

ADD REPLY • link 10 months ago by GenoMax 151k

0

Entering edit mode

Thanks for the link. It appears to me similar to Isoseq analysis using PAcbio smrtlink. I already have done Isoform annotation i.e. classification and filtering of ccs reads (with pacbio smrtlink including pigeon) of pacbio isoseq reads for genome annotation.

I want to perform Differential gene expression (DGE) analysis using Isoseq data i.e., pacbio (ccs) long reads similar to DEG analysis using Illumina short reads using DESeq2 or EdgeR.

Is there any way to perform DGE analysis of long reads? I can go upto the mapping using pbmm2 or minimap2. How can we do gene counting of aligned isoseq reads?

Thanks

ADD REPLY • link 9 months ago by tvibhaps ▴ 10

0

Entering edit mode

I'd also like to know if anyone has advice for this. I've had three ideas:

The most simple idea would be to group everything by gene in one of the classification files (either collapsed_classification.txt or filtered_lite_classification.txt)
Another possible option would be to align the samples on the flnc.bam so the data is filtered but there's still one read per transcript, then run a feature counter as normal.
The final option would be to use one of the tools they suggested. I was planning on using tappas and will look for a workaround because that seems like a huge oversight. If it doesn't work using DRIMseq or possibly even DESeq2 might be good options.

ADD REPLY • link 8 months ago by SethJ • 0

score 0 · Answer 1 · 2024-08-16

I have emailed Isoseq and they said that

"While the output of the Iso-seq pipeline is great for characterising samples, you’re right to identify a bit of a gap here. A number of researchers like you want to do differential expression analysis and there isn’t really a well-established path at the moment.

People are attempting to fill this with tools such as Cerberus and TALON and a fuller listing of recommended Iso-seq analysis tools is given in this application note: https://www.pacb.com/wp-content/uploads/Application-note-Bioinformatics-tools-for-full-length-isoform-sequencing.pdf"

In a followup they also said that they know it's a need, but the bioinformatics need to catch up as until recently the tech tended to use it for things like de-novo assembly and it wasn't available in the quantity (presumably read depth) required for differential expression etc.

score 0 · Answer 2 · 2024-08-17

Thank you so much everyone for your time here.

I made a way to perform the deseq2 analysis of pacbio reads- Since the reads in our samples had barcodes, so i had to demux them first. For that i followed some filtering tools from PAISOseq (https://www.nature.com/articles/s41467-019-13228-9) pipe line and then aligned the reads using minimap2, and input mapped reads into featureCount i.e. subread. Finally, i fed the featurecount output file into deseq2 and hence accomplished it.

Thanks again.