Question

Treating forward and reverse reads independently in BWA

0

Entering edit mode

3.0 years ago

shpak.max ▴ 60

I would like to threat the forward and reverse fragments as independent reads in a BWA alignment.

I'm using bwa aln in order to generate data consistent with those generated years earlier by the lab

Given that I have file_R1 and file_R2 with matched forward and reverse fragments,

bwa aln ref.fasta file_R1.fastq > file_R1.sai
bwa aln ref.fasta file_R2.fastq > file_R2.sai
bwa sampe - P ref.fasta file_R1.sai file_R2.sai file_R1.fastq file_R2_fastq > file.sam

will give a mapping that uses R1 and R2 as matched fragments. Given that I have the R1 and R2 fragments in separate files, is there a straightforward way to treat the reads as independent with bwa aln?

bwa • 2.4k views

ADD COMMENT • link updated 3.0 years ago by Istvan Albert 103k • written 3.0 years ago by shpak.max ▴ 60

score 0 · Answer 1 · 2022-09-06

0

Entering edit mode

3.0 years ago

GenoMax 153k

If I understand what you are asking right : you can't use the sampe step. If you want to treat the reads as single-end data then you will need to use samse to individually make sam files from the two reads.

That said, those reads are not really independent. They are simply the two ends of your library fragments.

ADD COMMENT • link 3.0 years ago by GenoMax 153k

0

Entering edit mode

yes, I also wondered what is this about, especially since using bwa aln is so outdated.

the poster should use bwa mem and not bwa aln even if they are trying to reproduce a previous finding,

there would be very few valid reasons for using bwa aln (for example checking the discrepancy with the older method)

ADD REPLY • link 3.0 years ago by Istvan Albert 103k

0

Entering edit mode

The only reason I'm using bwa aln is that I'm in a lab that generated a lot of sequence data over the past several years using a bwa aln pipeline. If I now use bwa mem, comparing recent sequence data to processed data that was generated 6-7 years ago would lead to inconsistencies. It's either continue to use bwa aln for what limited raw data we still need to process or go back and rerun everything using bwa mem.

ADD REPLY • link 3.0 years ago by shpak.max ▴ 60

0

Entering edit mode

Yes, this is correct.

One question I have is this: if I create two sam files based on alignments using R1 and R2 and then use samtools to merge them, will there be any problems downstream due to the fact that the matched reads from R1 and R2 have the same heading? In other words, is it necessary for me to change the read headings in R2 before performing the alignment and merging?

ADD REPLY • link 3.0 years ago by shpak.max ▴ 60

1

Entering edit mode

you probably should rename the reads, just in case

seqkit rename -h
rename duplicated IDs

Attention:
  1. This command only appends "_N" to duplicated sequence IDs to make them unique.
  2. Use "seqkit replace" for editing sequence IDs/headers using regular expression.

but I will say treating a paired end sequencing run as if it were twice as abundant single end run may introduce biases, it is important to take into account just what exactly is being measured

ADD REPLY • link 3.0 years ago by Istvan Albert 103k

0

Entering edit mode

Two questions: first, what are the alternatives to seqkit (other than writing a bash script usign sed or a python script) to rename the reads?

I'm not sure I understand the syntax for seqkit, either for your example or the (limited) documentation provided in its website, i.e. what does it even take as input arguments?

Second, I generated two bam files, one from R1, the other from R2. When I try to merge them using samtools merge, I get a "No @HD tag found" error. Perhaps a header tag is automatically generated by bwa sampe for paired end data but isn't for bwa samse? In any case, what do I need to do to correct this error so that I may merge the two bam files?

ADD REPLY • link 3.0 years ago by shpak.max ▴ 60

0

Entering edit mode

I see that your seqkit question is answered, in general look at the website here:

https://bioinf.shenwei.me/seqkit/usage/

second, the error indicates that your input files are not quite right, check that you are using the tool correctly, and that the outputs are SAM etc

ADD REPLY • link 3.0 years ago by Istvan Albert 103k

0

Entering edit mode

What exactly are you trying to do with this data?

ADD REPLY • link 3.0 years ago by GenoMax 153k

0

Entering edit mode

I'm comparing read depth with long vs. short reads

ADD REPLY • link 3.0 years ago by shpak.max ▴ 60

0

Entering edit mode

In that case you should simply align the data as you showed in the original post since using paired-end reads would allow them to be properly treated as sequenced fragment. Then calculate the depth using mosdepth or samtools depth. I don't see the point of using reads individually.

ADD REPLY • link 3.0 years ago by GenoMax 153k