Can someone explain to me what is the difference between clustering and collapsing of of reads in Iso-Seq analyses?
1
0
Entering edit mode
3.7 years ago
yvanpapa • 0

Hi, I have been working with Iso-Seq reads produced with PacBio Sequel II.

I have been following the Isoseq V3 pipeline based on the manual and on the github instructions https://www.pacb.com/wp-content/uploads/SMRT_Tools_Reference_Guide_v90.pdf https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md

After the final clustering step, I have been able to map the transcripts to a reference genome with pbmm2 and collapse them with command "collapse".

Everything is probably fine but looking in detail at my data, I see that the clustering step produced 127,857 HQ transcripts out of ~1 million FL reads. However, after collapsing based on genome mapping of this HQ set, the total number of isoforms drops to 93,943 (contained in ~16,000 genes or so).

My question is, what happened to these ~30,000 transcripts after collapsing? I thought that clustering would create a set of unique (i.e. non-redundant) transcripts. But collapsing seems to have further merged some of the reads into one isoform (according to the "group.txt" file).

There is probably something I am misunderstanding here. What is the difference between "clustering" and "collapsing", and what happens to the number or transcripts/isoforms retained during these two steps?

Thank you in advance for your help

Iso-Seq PacBio rna-seq SMRT-Tools transcripts • 2.4k views
ADD COMMENT
0
Entering edit mode
2.5 years ago

I found this in a publication (doi: 10.1101/gr.274282.120). "TSSs and TTSs may still have some error in their exact location as the clustering algorithm used by Iso-Seq3 allows for 100 bp of variability at the 5′ end and 30 bp of variability at the 3′ end of the transcript. Transcripts with start or end positions within this range are collapsed into a single isoform, creating a small window of possible TSS and TTS locations".

ADD COMMENT

Login before adding your answer.

Traffic: 1565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6