Question

minimap2 for polyA site sequenced with long read data (both cDNA ONT, PacBio)

0

Entering edit mode

4 months ago

shinyjj ▴ 50

Hi Biostars!

I am processing PolyA site sequenced with long read data (both cDNA ONT, PacBio) using minimap2. I am asking you the following questions as I want to make sure I am doing the right thing when processing them.

I would like to align polyA site sequenced with LR using minimap2. Minimap has a lot of different flags. What parameters should I use for minimap for both ont and pacbio? I think splice:hq is used for pacbio and splice is used for ont but to your knowledge, what is a good parameter to run minimap given fastq file?
After minimap, is there some kind of parameter I should be checking about statistics for the sample to make sure minimap is doing the right thing? If it’s doing the right thing, my guess is minimap throws out anything fishy. How do I know minimap is doing good or bad? Is there any stats I should plot to see that? I am generally asking about what quality check needs to be done for long read and what kind of stats would be good to see and plot to make sure minimap2 is doing the right thing!

Thank you!

minimap2 • 749 views

ADD COMMENT • link 4 months ago by shinyjj ▴ 50

0

Entering edit mode

PolyA site sequenced with long read data

Can you clarify what this means exactly?

ADD REPLY • link 4 months ago by GenoMax 148k

0

Entering edit mode

It's just long read fastq files focused on 3' end instead of full isoform to see different PolyA sites

ADD REPLY • link 4 months ago by shinyjj ▴ 50

0

Entering edit mode

Sounds like you must have already tried aligning the data. What did the result look like? I would think that the poly-A's should be soft-clipped and data aligned normally even with the default parameters.

ADD REPLY • link 4 months ago by GenoMax 148k

0

Entering edit mode

Oh I mean I wasn't sure which parameter is optimized for ONT and PacBio. I did not try them yet and only found the below command options are commonly used. Are these enough and are there any quality check required afterwards?

ONT minimap2 -ax splice

PacBio minimap2 -ax splice:hq -uf

ADD REPLY • link 4 months ago by shinyjj ▴ 50

1

Entering edit mode

Presets already defined in minimap2 should be good to start with.

-x STR       preset (always applied before other options; see minimap2.1 for details) []
                 - map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping
                 - map-hifi - PacBio HiFi reads vs reference mapping
                 - splice/splice:hq - long-read/Pacbio-CCS spliced alignment

If your data is of good quality then there should be no issues with alignment percentages. That would be the first thing to check. You can also check on coverage using a program like pandepth and a BED file for gene models.

ADD REPLY • link 4 months ago by GenoMax 148k

0

Entering edit mode

Thank you GenoMax!

As it's focused on polyA, using soft-clipped flag, k-mer 14 for ONT, k-mer 15 for PacBio would be good.

After minimap2, my file would be a bam file. Then, do you recommend checking MAPQ >= 30 and coverage using tool like samtools, pandepth? If I check the coverage, what is a good coverage? I see having MAPQ >= 30 shows reads are mapped uniquely or with high certainty to the reference genome but not sure what I can gain from coverage

ADD REPLY • link 4 months ago by shinyjj ▴ 50

0

Entering edit mode

what I can gain from coverage

Confidence that a particular site is real. If you see more than a certain number of reads piling up there.

ADD REPLY • link 4 months ago by GenoMax 148k

0

Entering edit mode

Thank you GenoMax! For my specific data, if I want to capture PolyA site, does it make sense to group them by tissues so I don’t have to reprocess everything from scratch? I would guess the answer is yes, but would like to hear your opinion! I am using IsoQuant to process minimap2 output to discover & quantify isoforms

ADD REPLY • link 4 months ago by shinyjj ▴ 50