How to use trimmed and merged outputs in bwa?
3
0
Entering edit mode
5.1 years ago

Hi, I just joined this website and I have a question:

I am working on some of the ancient plant samples and the goal is to perform a comparative analysis with modern species. sequencing has been done as paired-end and I have run trimmomatic for trimming and casper for merging. Now the output files are Forward_unpaired.fastq, Reverse_unpaierd.fastq and merged.fastq. so, I intend to use these input files in bwa but as I checked bwa helps, I could find the options for them, can anybody help me here how to use these files in bwa? If I have to run bwa once for both forward and reverse paired ones and once for merged ones, then how can I compile them afterward?

Thanks

alignment • 2.1k views
ADD COMMENT
1
Entering edit mode
5.1 years ago
Gabriel R. ★ 2.9k

This is how we got around it at the Max Planck in Leipzig. This is was for ancient DNA from Neanderthals/ancient humans but should work for plants, DNA is DNA after all :-)

Say we have in.fq1.gz in.fq2.gz as paired, untrimmed fastq but demultiplexed (not demultiplexed can be handle as well but needs an extra command. We use the following:

 fastq2bam  -o /dev/stdout  in.fq1.gz in.fq2.gz | leeHomMulti -u --ancientdna -o /dev/stdout /dev/stdin  | network-aware-bwa/bwa bam2bam -n 0.01 -o 2 -l 16500  -g reference.fasta  - | samtools sort ...

fastq2bam is from my own BCL2BAM2FASTQ

leeHom is a specialized adapter trimmer+merger for ancient DNA and shorter DNA molecules. It is still to my knowledge the most accurate, see software repo: leeHom and our paper in NAR

network aware bwa is a nifty fork of bwa aln from a colleague in Leipzig which can eat BAM and spit out bam. It also considers in the insert size computation both the merged and paired reads, see software here

and samtools sort is the normal samtools sort, adapt your command line to account for the version of samtools.

I am biased but this is the cleanest workflow for ancient DNA that I know of. Everything is bam and what goes it is what goes out. You can even instruct leeHom to keep the original reads+QC score with the --keepOrig, they will exist as QC failed reads and will be ignored but you can find them in the BAM file.

Hope this helps!

ADD COMMENT
0
Entering edit mode

thanks for your response and actually I have to go details of what you sent me because I am not a bioinformatician, so trying to understand all the details.

ADD REPLY
0
Entering edit mode

ok! let me know if you need help. network-aware-bwa can be a pain to install but I managed on a lot of systems.

ADD REPLY
0
Entering edit mode
5.1 years ago

Why can't you merge the bams after alignment? It's not clear to me that merging the pairs is going to help you much.

ADD COMMENT
0
Entering edit mode
5.1 years ago

thank you very much for your immediate response, since it is aDNA, we are trying to be as sensitive as possible and contain more info.

ADD COMMENT

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6