Problem with BWA MEM
0
0
Entering edit mode
2.5 years ago
sihak • 0

Hi everyone,

I am having problems aligning my scATAC-seq data. I've been working on a preprocessing workflow with fastp for qc, scATAC-pro (demultiplexing and adapter trimming), BWA and Sinto for four samples in total.

After alignment everything seemed ok, no errors, so I proceeded to create fragment files with Sinto and two samples resulted in empty fragment files. I run bamPEFFragmentSize by deeptools and it seems that BWA MEM recognizes two of the samples as one large fragment. This is an example of two outputs, one (hopefully) successful and one failed:

Sample label: sample2
Sample size: 53088

Fragment lengths:
Min.: 0.0
1st Qu.: 85.0
Mean: 178.17418248945148
Median: 170.0
3rd Qu.: 232.0
Max.: 738.0
Std: 108.9054958604194
MAD: 76.0
Len. 10%: 57.0
Len. 20%: 75.0
Len. 30%: 96.0
Len. 40%: 130.0
Len. 60%: 195.0
Len. 70%: 219.0
Len. 80%: 251.0
Len. 90%: 335.0
Len. 99%: 503.0

Sample label: sample3
Sample size: 98516

Fragment lengths:
Min.: 0.0
1st Qu.: 0.0
Mean: 416681.2713873889
Median: 0.0
3rd Qu.: 0.0
Max.: 235993170.0
Std: 5062730.410878585
MAD: 0.0
Len. 10%: 0.0
Len. 20%: 0.0
Len. 30%: 0.0
Len. 40%: 0.0
Len. 60%: 0.0
Len. 70%: 0.0
Len. 80%: 0.0
Len. 90%: 0.0
Len. 99%: 8801147.

By browsing the log from BWA MEM I noticed, that the two failed samples are giving me this message:

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs

Comparing to the more successful samples for which the log is more informative:

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 66306, 1, 3)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (72, 129, 214)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 498)
[M::mem_pestat] mean and std.dev: (152.71, 96.61)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 640)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 205504 reads in 22.829 CPU sec, 22.450 real sec
[M::process] read 205406 sequences (10000051 bp)...

I checked whether my PE1.fastq and PE.fastq could have different number of reads with | wc -l but they do not. Also, with samtools flagstat I do not see any differences between these successful vs. non-successful files that could explain this behavior. I am using the latest 1.15.1 version of samtools for sorting and sam-bam conversion and the v.0.7.17 for BWA.

This is the command I use for running the samples after creating the index:

bwa mem \
    ${indexfile} \
    "$sample1"PE1.fastq.gz \
    "$sample1"PE2.fastq.gz \
    > "$sam"sample1_hg38_aln.sam

I am currently running out of tricks to try. Any help for troubleshooting will be greatly appreciated :)

alignment • 1.1k views
ADD COMMENT
0
Entering edit mode

It looks like the fastq files don't have the standard Illumina naming structure. Was there any processing done on the fastq files?

ADD REPLY
0
Entering edit mode

Before demultiplexing and trimming I concatenated the sample-specific files with cat from several sequencings but nothing more. Could that explain why two of the samples seem ok and the rest don't? I used the exact same workflow for each sample.

ADD REPLY
0
Entering edit mode

with cat from several sequencings but nothing more.

did you keep the same order for the R1 and the R2 files when cat-ing ?

ADD REPLY
0
Entering edit mode

I'm sorry, there was a typo above that made my post very confusing and I fixed it :) Here's a clarification:

I had four sequencing files each R1 (forward read), R3 (reverse read), I1 (sample barcode) and R2 (cell barcode), i.e 4 x R1, 4 x R2, 4 x R3 and 4 x I1 for each sample. I concatenated these to end up with only one fastq for each R1, R2, R3 and I1 per sample.

Next, I appended the cell barcode to the forward and reverse reads and ended up with the PE1.fastq and PE2.fastq. Then I performed trimming.

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6