Question

minimap2: aligning ambigious sequence containing N to fastq reads.

0

Entering edit mode

5.2 years ago

vinaykusuma ▴ 10

I have a fasta file called barcode masked.fa

Ex:

>Orf1_ab  
AGAGCAGCAGAAGTGGCACNNNNNNNNAGGTGATTGTGAAGAAGAAGAG

Trying to align this on my nanopore reads with minimap.
I am not able to figure out option to do that.
I expect my reads in fastq to not match at those 10bp (denoted by N) in fasta
but should align to the upstream and downstream sequence of N in fastq.

my command is

minimap2  -t 4 -ax map-ont  barcode_masked.fa  all_pass_files.fastq > P5_masked_aln_reads.sam

Currently i get no alignments in my .sam file.
Any help will be appreciated. Thank you.

alignment sequencing gene next-gen • 2.2k views

ADD COMMENT • link updated 5.2 years ago by GenoMax 153k • written 5.2 years ago by vinaykusuma ▴ 10

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY • link 5.2 years ago by GenoMax 153k

0

Entering edit mode

Is that a real read? You could try a small k-mer size.

minimap2 paper says the following:

works with accurate short reads of ≥100 bp in length

ADD REPLY • link 5.2 years ago by GenoMax 153k

0

Entering edit mode

Thanks for that, But i tried with 51 bp and i got lot of hits in my sam file. Then i compared the count of hits with 51M in cigar to my grep count searching for exact match of 51bp seq and i am getting more matches with minimap. so, technically minimap is working. I looked at some alignments manually too, looked fine to me. But I will surely look again, Thanks for that.

ADD REPLY • link 5.2 years ago by vinaykusuma ▴ 10

0

Entering edit mode

But why? What is the aim of this?

ADD REPLY • link 5.2 years ago by WouterDeCoster 48k

0

Entering edit mode

See: C: minimap2: aligning multiple regions to a read in a fastq

ADD REPLY • link 5.2 years ago by GenoMax 153k

0

Entering edit mode

I want to get count of sequences(accounting subsitution errors) in fasta file. I have lot of data so i think the fastest way is to use a aligner. I tried writing a script, using bbduk.sh but the become slow with lots of data.

ADD REPLY • link 5.2 years ago by vinaykusuma ▴ 10

0

Entering edit mode

I tried writing a script, using bbduk.sh but the become slow with lots of data.

I find it hard to believe that bbtools programs are slow. Most are all multi-threaded (including bbduk) and will run rings around any program (short of something written in native c++). If you can tell me what you were trying to do (with exact bbduk command) we can see if that can be optimized.

ADD REPLY • link 5.2 years ago by GenoMax 153k