Question

Map illumina reads to multi-fasta in "no mismatches" mode with statistics

0

Entering edit mode

5.7 years ago

grassostefan ▴ 10

I did an illumina sequencing in paired-ends. Now I would like to map my reads to my reference. The reference is a multi fasta of short sequences of approx 100-150 nt (in total >10k sequences). Additionally I want to know how many times reads are mapped on each entry of the multi-fasta. The last constraint is that I want each mate pair to be mapped on the same entry of the reference.

So far I managed to do it only with the merged reads and then mapping them with bbmap which will output the statistics I am interested in with the 'scafstats' option. When I repeat the process with PE sequences I get a negligible fraction to be correctly mapped (in contrast to >90% with the same reads but merged).

I've been trying with bbmap but also Bowtie2 to map the PE reads but it is clear I am giving or forgetting some option. I've been reading both manual quite a lot, but couldn't find a solution (neither I found online). Do you know if it is possible to do so with PE in bbmap or Bowtie or any other software? I'll soon try with BWA as well, but from the manual I could not find something that looked like what I want.

Any other suggestion, also in approach is welcomed.

DNAseq alignment sequencing paired-end assembly • 1.4k views

ADD COMMENT • link 5.7 years ago by grassostefan ▴ 10

0

Entering edit mode

When I repeat the process with PE sequences I get a negligible fraction to be correctly mapped (in contrast to >90% with the same reads but merged).

What does that exactly mean? Why is negligible fraction correctly mapped if you are able to map the merged read (as a single end I suppose) fine? Post bbmap command lines to see if those can be helpful for diagnosis.

ADD REPLY • link 5.7 years ago by GenoMax 147k

0

Entering edit mode

Hi I have the same problem. I tried to map against a multi-fasta file, which contains only coding sequences and almost got no mapped reads. When I mapped against the whole genome, 99% of reads were mapped successfully. The problem is that I do ribosome profiling and just have the coding sequences in my samples (mRNAs). Any ideas? Here is the code i used:

bbmap.sh in=example.fastq.gz trimreaddescription=t ref=oligos.fasta k=8 ambig=random outm=out.sam

and the fasta file:

oligos.fasta

ADD REPLY • link 4.2 years ago by jakobjung • 0

0

Entering edit mode

This isn't making a lot of sense. So if you align against transcirptome then you get almost no alignment but 99% of reads align if you do that against the genome?

What is your read length and what kind of data is this? If this question is about bbmap, please provide full command line you are using.

ADD REPLY • link 4.2 years ago by GenoMax 147k