Question

How to generate contigs from bam alignment

1

Entering edit mode

9.3 years ago

goubert.clement ▴ 30

Hi all,

I am currently trying to perform a new assembly using paired and mate paired reads. I first mapped my paired-end on a reference genome. Now I would like to extract from the mapping only the covered portions and get contigs from my reads (not a consensus with the reference genome including N between the uncovered portions). I would like then to use the paired-end reads and the mate paired to scaffold my assembly.

I there a proper way to do that ? I think maybe mpileup could help, but until now I only managed to produced consensus.

Extra question, which software would you recommend to perform the scaffolding ? Does any one exist allowing different insert size ?

Thank you very much!

Clément

bam Assembly contigs • 9.9k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 9.3 years ago by goubert.clement ▴ 30

1

Entering edit mode

9.3 years ago

thackl ★ 3.0k

If your goal is to reassembly mapped reads, you can use something like

samtools view -bf <mapped+other criteria> your.bam | 
 samtools bam2fq /dev/fd/0 > mapped_reads.fq

and than simply use an assembler to reassemble the fastq reads or provide the reads to a scaffolder

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

This command only return the reads in alignment right in fq? Because my goal is actually to use the mapping info to avoid a full de novo assembly. I would like to know if I simply can have the linear consensus from my reads regarding their position in the alignment, and if there is gaps, have different contig instead of NNNN. Thanks!

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by goubert.clement ▴ 30

0

Entering edit mode

From your description above, I thought you did not want to produce a consensus., For consensus have a look at Ngs Question ~ Consensus for example. There also other threads regarding the matter. Just search for bam+consensus.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

Well, thanks for your reply. I had a look around, however, I think I still don't find what I really want to do... :S.

I do not want the consensus to be regarding my reads + the ref genome, but to be only a "fasta projection" of my reads, ordered thanks to the reference genome. I have for examples some areas of the ref chromosome that are not covered, so I expect to have multiple contigs of the chromosome, instead of one chromosome but with NNN filling the gaps... I may be not clear,... I apology !! Thanks again!

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by goubert.clement ▴ 30

0

Entering edit mode

So what you have is kind of this:

pile1          pile2
-- --          -- --
 -- --       -- -- --       reads
-- -- --    -- --   --
___________________________ ref

and what you want is:

pile1
-- --
 -- --
-- -- --
________ < pile1 consensus

  pile2
  -- --
 -- -- --       reads
-- --   --
_______________ < pile2 consensus

?

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

Exactly !!!

ADD REPLY • link 9.3 years ago by goubert.clement ▴ 30

0

Entering edit mode

Hi thackl, is possible with SeqFilter do this?

pile1          pile2
-- --          -- --
 -- --       -- -- --       reads
-- -- --    -- --   --
___________________________ ref

pile1          pile2
-- --          -- --
 -- --       -- -- --       reads
-- -- --    -- --   --
_____N______NNNNNNNNNNN ref

Thanks!

EDIT: I got it! thanks

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 8.1 years ago by jolobito • 0

0

Entering edit mode

9.3 years ago

goubert.clement ▴ 30

After a first try, it seems to do exactly what I need ! Thank you very much for your help and support thackl !

ADD COMMENT • link 9.3 years ago by goubert.clement ▴ 30

0

Entering edit mode

You are welcome!

ADD REPLY • link 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

I would also like to try this with proovread but installation fails at the "make install" step and the instructions are confusing a bit. Am I supposed to type exactly: "make install PREFIX=/Users/GBLock/src/proovread" ?

ADD REPLY • link 5.1 years ago by DNAngel ▴ 250

0

Entering edit mode

You can use PREFIX=/path/to/where/you/want/proovread/to/be/installed make install if you want to move proovread to a certain location. You don't need to do that, though. It will also just run without make install from the directory you downloaded it to.

ADD REPLY • link 5.1 years ago by thackl ★ 3.0k

0

Entering edit mode

9.3 years ago

Manu Prestat 4.1k

What about a mapper / assembler which uses a reference sequence? e.g. MAQ...

or one of the many "reference assisted" assemblers?

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by Manu Prestat 4.1k

Ram · Accepted Answer · 2015-08-24

2

Entering edit mode

9.3 years ago

thackl ★ 3.0k

Here is an approach that might do what you want, I've tested it with some random data - In the first panel are mappings of some simulated reads with some SNPs, below are the mappings of the two generated contigs.

To generate the consensus contigs I used two modules from proovread, a correction software I have written. You will need to clone/install it from github and you will need a recent version of samtools for it to run.

# generate consensus (ignoring reference)
bam2cns --bam reads-to-ref.bam --ref ref.fa --prefix cns.fq

# trim no-coverage regions and convert to fasta
SeqFilter --in cns.fq --out cns.trimmed.fa \
 --fasta --min-length 100 \
 --phred-offset 33 --trim-win 1,1

ADD COMMENT • link 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

Thanks a lot! I will try your method as soon as possible and let you know how it worked! It seems to do exactly what I want!

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by goubert.clement ▴ 30

0

Entering edit mode

I'm almost done, however, I had this error at the second step:

Can't locate Verbose.pm in @INC (you may need to install the Verbose module) [...]
BEGIN failed--compilation aborted at ~/proovread/bin/SeqFilter line 13.

Do I need the Verbose module, if yes, where could I find it ? I am not familiar with the those tools.

Many thanks!

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by goubert.clement ▴ 30

1

Entering edit mode

Verbose.pm comes with proovread, use

export PERL5LIB=/path/to/proovread/lib:$PERL5LIB

to make it visible to SeqFilter

ADD REPLY • link 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

Thanks! It is running!

ADD REPLY • link 9.3 years ago by goubert.clement ▴ 30

0

Entering edit mode

How would this approach deal with indels?

ADD REPLY • link 5.1 years ago by kfletcher ▴ 40

0

Entering edit mode

They should look like in the reads, not like in the reference.

ADD REPLY • link 5.1 years ago by thackl ★ 3.0k