Question

How to use trimmed and merged outputs in bwa?

0

Entering edit mode

5.7 years ago

haghverdi_ac • 0

Hi, I just joined this website and I have a question:

I am working on some of the ancient plant samples and the goal is to perform a comparative analysis with modern species. sequencing has been done as paired-end and I have run trimmomatic for trimming and casper for merging. Now the output files are Forward_unpaired.fastq, Reverse_unpaierd.fastq and merged.fastq. so, I intend to use these input files in bwa but as I checked bwa helps, I could find the options for them, can anybody help me here how to use these files in bwa? If I have to run bwa once for both forward and reverse paired ones and once for merged ones, then how can I compile them afterward?

Thanks

alignment • 2.5k views

ADD COMMENT • link 5.7 years ago by haghverdi_ac • 0

score 1 · Answer 1 · 2019-11-26

This is how we got around it at the Max Planck in Leipzig. This is was for ancient DNA from Neanderthals/ancient humans but should work for plants, DNA is DNA after all :-)

Say we have in.fq1.gz in.fq2.gz as paired, untrimmed fastq but demultiplexed (not demultiplexed can be handle as well but needs an extra command. We use the following:

 fastq2bam  -o /dev/stdout  in.fq1.gz in.fq2.gz | leeHomMulti -u --ancientdna -o /dev/stdout /dev/stdin  | network-aware-bwa/bwa bam2bam -n 0.01 -o 2 -l 16500  -g reference.fasta  - | samtools sort ...

fastq2bam is from my own BCL2BAM2FASTQ

leeHom is a specialized adapter trimmer+merger for ancient DNA and shorter DNA molecules. It is still to my knowledge the most accurate, see software repo: leeHom and our paper in NAR

network aware bwa is a nifty fork of bwa aln from a colleague in Leipzig which can eat BAM and spit out bam. It also considers in the insert size computation both the merged and paired reads, see software here

and samtools sort is the normal samtools sort, adapt your command line to account for the version of samtools.

I am biased but this is the cleanest workflow for ancient DNA that I know of. Everything is bam and what goes it is what goes out. You can even instruct leeHom to keep the original reads+QC score with the --keepOrig, they will exist as QC failed reads and will be ignored but you can find them in the BAM file.

Hope this helps!

score 0 · Answer 2 · 2019-11-26

0

Entering edit mode

5.7 years ago

swbarnes2 15k

Why can't you merge the bams after alignment? It's not clear to me that merging the pairs is going to help you much.

ADD COMMENT • link 5.7 years ago by swbarnes2 15k

score 0 · Answer 3 · 2019-11-27

0

Entering edit mode

5.7 years ago

haghverdi_ac • 0

thank you very much for your immediate response, since it is aDNA, we are trying to be as sensitive as possible and contain more info.

ADD COMMENT • link 5.7 years ago by haghverdi_ac • 0