How to find alternative splicing isoforms in reads generated by nanopore sequencing experiment
1
1
Entering edit mode
6.8 years ago

Hi everyone,

I am working with a large library of long cDNAs reads coming from a 1D nanopore sequencing experiment. I succesfully mapped those reads against my genome of interest using minimap2 but wasn't able to continue much further.. I visualize my reads with IGV so I can see they map correctly. However, I would like to be able to determine the different isoforms I have in my dataset. I tried to extract the reads for one region of interest into a fasta file and then cluster the reads that looks alike, hoping that would give me back the different isoforms but it didn't really work as expected. I assume this is because I have lots of error in my reads, and I get the same bad results when I increase the acceptable error rate. I think scripts like vsearch or CD-HIT were designed to work well with short sequencing reads, not with long error-prone reads.

Apparently there is possibilities to generate what I'm looking for directly from the SAM/BAM file I got after the alignment but I'm kinda lost. I saw on this thread an image that seems to be the result I'd like to get (C: Use kallisto with ONT (nanopore) cDNA long reads - the blue line at the bottom that recapitulates exon usage) but I really don't know where to start..

Would someone be willing to help me or provide me with some guidance/tutorial ? Thank you in advance !

Florian.

sequencing nanopore RNA-Seq clustering long-reads • 2.4k views
ADD COMMENT
1
Entering edit mode

I haven't tried it, but perhaps you can find some inspiration in this pipeline: https://github.com/christopher-vollmers/Mandalorion

ADD REPLY
0
Entering edit mode

Thanks for the link, I will try and see if it can help me get what I want !

ADD REPLY
1
Entering edit mode
6.8 years ago
jean.elbers ★ 1.7k

If you are trying to do differential transcript expression (not differential gene expression), then a combination of some tool to first error correct the Nanopore reads [sorry not familiar with the best tool to suggest for this task], followed by Genome Guided Trinity with --long_reads option, followed by LACE (https://github.com/Oshlack/Lace/wiki/Example%3A-Long-Read-superTranscriptome-Construction-and-Visualisation) might be useful to do differential transcript expression analysis (https://github.com/Oshlack/Lace/wiki/Example%3A-Differential-Transcript-Usage-on-a-non-model-organism).

I've done something similar with PacBio Sequel subreads but not Nanopore 1D rreads.

ADD COMMENT
0
Entering edit mode

Thanks. I looked into it but for some reason I don't seem to be able to perform de novo assembly with Canu or Trinity. It seems either my machine is not powerful enough to run the script or my data are not good enough. I'm working on cDNAs, most of my reads are 1kb or lower but Canu seem to only care about reads longer than 1Kb to perform the assembly.. I'm gonna keep looking into it but it's a bit frustrating. Thanks for your time !

ADD REPLY

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6