Alternative to BLASR ?
3
2
Entering edit mode
7.9 years ago
Rox ★ 1.4k

Hi everyone,

In order to polish my Falcon assembly of my D. suzukii genome, from PacBio long reads, I need to align the raw reads to my genome and create a .bam or a .sam file.

Such a tool already exist at PacBio, called blasr, but is currently not working (the 5.3 version I've installed do not provide a .bam or a .sam output... Which is VERY annoying).

Does an alternative of such aligner exist ? It has to be a aligner aware of the higher error rate of PacBio reads (that's why I can't use a classic aligner).

Any ideas ? I'm truely desesperate, I absolutely need to polish my assembly before going further in my analysis.

Thanks for your help !

Roxane

PacBio assembly alignement • 6.9k views
ADD COMMENT
5
Entering edit mode
7.9 years ago
Medhat 9.8k

bwa and BBmap (as far as I know limited to 6kb)


PacBio subreads or Oxford Nanopore reads to a reference genome:
bwa mem -x pacbio ref.fa reads.fq > aln.sam

more here

Update.

Minimap2

is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP.

ADD COMMENT
1
Entering edit mode

Roxane Boyer : With BBMap PacBio reads needed to be in fasta format and those above 6kb were broken into 6kb pieces during search automagically (with an _1, _2 etc). Since Brian Bushnell does not remove functionality from BBMap suite it should still work. Also see this original post. Try and let us know.

ADD REPLY
0
Entering edit mode

Okay thanks you a lot for the advices ! I'm going to try this !

ADD REPLY
0
Entering edit mode

Hi Genomax,

I'm coming back to you again about BBmap. After having some issues with bwa-mem (depending on the number of threads used, I didn't get the same results), I want to try bbmap. I understood that the large reads from my pacbio sequencing will have to be sliced in smaller pieces. However, as it's my first time using this tool, I was looking for some advanced parameters for PacBio reads, here what I found ( http://seqanswers.com/forums/showthread.php?t=58221 ) :

$ mapPacBio.sh -Xmx20g k=7 in=reads.fastq ref=reference.fa maxlen=1000 minlen=200 idtag ow int=f qin=33 out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400

In this post, it is written that :

The "maxlen" flag shreds them to a max length of 1000; you can set that up to 6000. But I found 1000 gave a higher mapping rate.

But this section was "Oxford Nanopore" section, I assume that the author meant long reads generally, but as I'm not really sure, I wanted to have your opinion.

Should I use 6kb or 1kb for the reads slicing for PacBio data ?

ADD REPLY
0
Entering edit mode

What do you mean limited to 6kb ? The read length ? Because some of my reads are much longer than 6kb. Anyway, thanks for the tools I'm going to have a look on it

ADD REPLY
0
Entering edit mode

Okay, I've tried the following lines :

./bwa index /media/loutre/SUZUKII/assembly_tries/pb_268_p_ctg.fasta
./bwa mem -x pacbio /media/loutre/SUZUKII/assembly_tries/pb_268_p_ctg.fasta /media/DATAPART1/Documents/suzukii_assembly/filtered_subreads.fastq | gzip -3 > /media/loutre/SUZUKII/assembly_tries/falcon_aligned_reads.sam.gz

Then I wanted to visualize my alignements on IGV with the decompressed sam file, but no reads where shown, IGV said that no reads was corresponding to my genome (but it was actually the right file).

Here the head of my sam file :

@SQ SN:000000F  LN:20279970
@SQ SN:000001F  LN:15392955
@SQ SN:000002F  LN:12197397
@SQ SN:000003F  LN:8269814
@SQ SN:000004F  LN:7009319
@SQ SN:000005F  LN:6740481
@SQ SN:000006F  LN:5550981
@SQ SN:000007F  LN:4663438
@SQ SN:000008F  LN:4675199
@SQ SN:000009F  LN:4040179

I'm not very familiar with the SAM format, but it looks like something is wrong. Did I missed something ? Do I have to reformat the output so it look like more at something like this :

@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:45
r001   99 ref  7 30 8M2I4M1D3M = 37  39 TTAGATAAAGGATACTG *
r002    0 ref  9 30 3S6M1P1I4M *  0   0 AAAAGATAAGGATA    *
r003    0 ref  9 30 5S6M       *  0   0 GCCTAAGCTAA       * SA:Z:ref,29,-,6H5M,17,0;
r004    0 ref 16 30 6M14N5M    *  0   0 ATAGCTTCAGC       *
r003 2064 ref 29 17 6H5M       *  0   0 TAGGC             * SA:Z:ref,9,+,5S6M,30,1;
r001  147 ref 37 30 9M         =  7 -39 CAGCGGCAT         * NM:i:1
ADD REPLY
0
Entering edit mode

Why do you pipe bwa mem to gzip? I haven't seen something like that before and would suggest this:

bwa mem ref.fasta input.fastq | samtools view -b - | samtools sort - > output.bam

After alignment the sam file is converted to bam and sorted without intermediate files.

ADD REPLY
0
Entering edit mode

Well, I've just copied what is written on the README file of bwa mem ! I'm not a original person, I've just followed the example they gived :

./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz

But well, I'm going to try without compressing it.

To go back on my first file output, we agree that this sam file is empty right ? There is no informations inside it ?

ADD REPLY
0
Entering edit mode

To get information about the reads in a sam/bam file use:

samtools idxstats yourfile.bam
samtools flagstat yourfile.bam
ADD REPLY
0
Entering edit mode

Well, it says that it failed to load the index, I guess it means it's not a real sam file ? The construction of my file look weird for me. I'm not sure if this is because I didn't saw a lot of Samf ile, or if it's because something really went wrong.

ADD REPLY
0
Entering edit mode

Did you create an index of the file using the following command?

samtools index yourfile.bam

I've no experience with creating sam.gz files, so it's possible that this is the source of the problem

ADD REPLY
2
Entering edit mode
7.9 years ago
brent_wilson ▴ 140

I have had much better luck for PACBIO alignment with BLAT, followed by some filtering for "true" alignments (alignment length, number of matches, etc.). It took a bit of work to set the parameters, and I haven't dealt with it in awhile, but it worked well for finding alignments (even for partial hits when I was working with draft assemblies).

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENT
0
Entering edit mode

It would be useful to list those parameters, if you can dig them up. Will save people having to go through the same effort again (or at least serve as a starting point).

ADD REPLY
0
Entering edit mode

I would have, but everything is saved over at my previous position where I can't access it anymore. I do remember looking at the identity and coverage of each hit and using dot plots to train the parameters.

ADD REPLY
0
Entering edit mode

Hi Brent, thanks for your indication ! So you don't have any memory of how we can parameter BLAT for such an use ? it will be really useful for me to have such an information ! If you remember of something, to hesitate to write it down here !

ADD REPLY
2
Entering edit mode
6.4 years ago
harish ▴ 470

Hi!

Sorry to bring up the thread, but I wanted to put it down in case someone else faced the issue. I have recently used minimap2 to map the PacBio reads and then use pbbamify to put the alignment in the requisite format for Arrow.

I had about 100x data for a genome I am assembling and blasr was taking ages.

Please do give this a try!

Update: You can now simply use PBMM2 from pb-bioconda to create the aligned bams and later user pbindex to generate the BAM index if needed.

ADD COMMENT
1
Entering edit mode

Hello and thanks for all these suggestions. These further comments might help someone else.

I have used minimap2 to map PacBio raw reads (converted to fastq as required for minimap2) against my de novo assembly. I recommend this approach over blasr. Use samtools to get the aligned.bam. Minimap2 does not output bam files.

I then used smrt tools to create a mydata.xml file using the following:

create mydata.xml /bam/*subreads.bam

Then use pbbamify to convert the aligned.bam to correct pb format for submission to genomicconsensus arrow. Took a while to work through this process - thought it might help someone else.

pbbamify --input aligned.bam --output=aligned.pb.bam ref.fasta mydata.xml

ADD REPLY
1
Entering edit mode

Use samtools to get the aligned.bam. Minimap2 does not output bam files.

Just FYI (unclear how you did it) that you can do

minimap2 -t 8 -a genome.fa reads.fastq.gz | samtools sort -@8 -o my_sorted_alignment.bam

and as such avoid intermediate files and directly create a bam file (indeed using samtools).

ADD REPLY
0
Entering edit mode

Yes, you are correct. I should have been clearer. This is the script I used.

minimap2 -ax map-pb genome.fasta *.fastq \ | samtools view -hF 256 - \ | samtools sort -@ 8 -o genome_aligned.bam

I should also say that I am still having some trouble with pbbamify. So I haven't got this whole step working seamlessly yet.

ADD REPLY
1
Entering edit mode

Minimap doesn't really care for fastq files tbh. As such the Sequel runs have series of '!!!!!!' representing the qualities anyways.

You can definitely use Minimap's "-a" parameter to get a sam file and then convert it into Arrow specific format using pbbamify.

Also take a look at pbbioconda repos. If you have the Bams from Sequel, you can directly use Pbmm2, which uses minimap api to align the reads to the genome.

Plus these days you don't really need SMRTlink tools unless you are really looking at specific analysis. Most genome analysis these days are covered in pbbioconda!

ADD REPLY
0
Entering edit mode

Thanks for these tips @harish. I did not know that I could directly use subreads with Pbmm2. Actually this all looks more promising as a way forward.

Pbbamify worked after 130 hours of walltime. It output a very large pb.bam (351G) from aligned.bam (36G). Arrow still needs a pbi indexed file according to error report.

I really appreciate your help on this!

ADD REPLY
0
Entering edit mode

You can use pbindex to generate the index files.

ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6