Hi everyone,
I am working with a large library of long cDNAs reads coming from a 1D nanopore sequencing experiment. I succesfully mapped those reads against my genome of interest using minimap2 but wasn't able to continue much further.. I visualize my reads with IGV so I can see they map correctly. However, I would like to be able to determine the different isoforms I have in my dataset. I tried to extract the reads for one region of interest into a fasta file and then cluster the reads that looks alike, hoping that would give me back the different isoforms but it didn't really work as expected. I assume this is because I have lots of error in my reads, and I get the same bad results when I increase the acceptable error rate. I think scripts like vsearch or CD-HIT were designed to work well with short sequencing reads, not with long error-prone reads.
Apparently there is possibilities to generate what I'm looking for directly from the SAM/BAM file I got after the alignment but I'm kinda lost. I saw on this thread an image that seems to be the result I'd like to get (C: Use kallisto with ONT (nanopore) cDNA long reads - the blue line at the bottom that recapitulates exon usage) but I really don't know where to start..
Would someone be willing to help me or provide me with some guidance/tutorial ? Thank you in advance !
Florian.
I haven't tried it, but perhaps you can find some inspiration in this pipeline: https://github.com/christopher-vollmers/Mandalorion
Thanks for the link, I will try and see if it can help me get what I want !