Question

BWA not storing sequence

0

Entering edit mode

8.0 years ago

tanya.copley ▴ 10

Hi, I am having an issue with BWA genome alignment where sequence information is not being stored (i.e. sequence output in bam file is a * ). Can anyone tell me why this is happening and how to retrieve my sequences?? I need them for doing downstream SNP analysis. My code for alignment, sam to bam and sorting: bwa mem -t $cpu -r 1 -a -M -R $rg $genome $l | \ sambamba view -t $cpu -f bam -h -S /dev/stdin | \ sambamba sort -t $cpu -o $aligned/$flow/$lane/${nameNoExt}_RG_sorted.bam /dev/stdin

where $rg is my read group tag, $genome is the path to my genome file and $l is the loop to my input files

Here is a portion of the output when using: samtools view /path/to/file.bam | head -n 20 The first two have their sequences, while the remaining three output sequences have stars as their sequences. How can I tell bwa to keep these sequences?

HWI-ST521:68:C07EDACXX:2:2106:17357:162867  0   Gm01    30991   60  93M *   0   0   CAGCGTGAAGGGAACGAGTATTATTATTTATAACCAGCCACGTCATCAAGAGAAGAATATGGAAGGTGGATCGGGCCTTGCTATCCACACCCC   FFFHFHHDHIIGJIJIJIEFGIJJJIIJJCIHGIGIIJJIJIGHGGFGIECHIIIIEHHHHEDFF<;@CACD>BBBBBC>ACDDCC>?><<@5   NM:i:1  MD:Z:45G47  AS:i:88 XS:i:0  RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:1103:2382:91229    0   Gm01    33652   60  49M *   0   0   CAGCAATTCTGATAACAGTACCATTTTTTTTTTTGGAAGGCAAAATTAG   FFFHHHGHEGIBHIJIJJHIIGGIIJJJJJJJIGBHA?EC2?DECA;;;   NM:i:0  MD:Z:49 AS:i:49 XS:i:26 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:2106:5869:22021    256 Gm01    33999   0   60M34H  *   0   0   *   *   NM:i:1  MD:Z:45T14  AS:i:55 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:1101:3451:56376    256 Gm01    33999   0   59M35H  *   0   0   *   *   NM:i:2  MD:Z:45T5A7 AS:i:49 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:2106:19619:31878   256 Gm01    33999   0   59M35H  *   0   0   *   *   NM:i:2  MD:Z:45T5A7 AS:i:49 RG:Z:PI438460

BWA not storing sequence SNP • 3.0k views

ADD COMMENT • link updated 8.0 years ago by Santosh Anand 5.8k • written 8.0 years ago by tanya.copley ▴ 10

0

Entering edit mode

Hello tanya.copley!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=200869#post200869

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 8.0 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2016-11-09

1

Entering edit mode

8.0 years ago

piet ★ 1.9k

You have sorted your BAM file. Therefore, hard-clipped reads are appearing before soft-clipped alignments of the same read. See also this thread: bwa mem hard clipping

ADD COMMENT • link 8.0 years ago by piet ★ 1.9k

score 0 · Answer 2 · 2016-11-09

The flag field of all the 3 reads of missing query sequences is 256, which correspond to not primary alignment. BWA will usually give the Query sequences only for primary alignments to save space and avoid unnecessary duplicated information. Since you have sorted the bam file, the primary alignments are no longer the first alignments in the list (see piet's answer and link for more detail). Do a grep on one (any) of the reads to see all its alignment, and in one of them the full sequence should be there.