Hi, I am having an issue with BWA genome alignment where sequence information is not being stored (i.e. sequence output in bam file is a * ). Can anyone tell me why this is happening and how to retrieve my sequences?? I need them for doing downstream SNP analysis. My code for alignment, sam to bam and sorting: bwa mem -t $cpu -r 1 -a -M -R $rg $genome $l | \ sambamba view -t $cpu -f bam -h -S /dev/stdin | \ sambamba sort -t $cpu -o $aligned/$flow/$lane/${nameNoExt}_RG_sorted.bam /dev/stdin
where $rg is my read group tag, $genome is the path to my genome file and $l is the loop to my input files
Here is a portion of the output when using: samtools view /path/to/file.bam | head -n 20 The first two have their sequences, while the remaining three output sequences have stars as their sequences. How can I tell bwa to keep these sequences?
HWI-ST521:68:C07EDACXX:2:2106:17357:162867 0 Gm01 30991 60 93M * 0 0 CAGCGTGAAGGGAACGAGTATTATTATTTATAACCAGCCACGTCATCAAGAGAAGAATATGGAAGGTGGATCGGGCCTTGCTATCCACACCCC FFFHFHHDHIIGJIJIJIEFGIJJJIIJJCIHGIGIIJJIJIGHGGFGIECHIIIIEHHHHEDFF<;@CACD>BBBBBC>ACDDCC>?><<@5 NM:i:1 MD:Z:45G47 AS:i:88 XS:i:0 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:1103:2382:91229 0 Gm01 33652 60 49M * 0 0 CAGCAATTCTGATAACAGTACCATTTTTTTTTTTGGAAGGCAAAATTAG FFFHHHGHEGIBHIJIJJHIIGGIIJJJJJJJIGBHA?EC2?DECA;;; NM:i:0 MD:Z:49 AS:i:49 XS:i:26 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:2106:5869:22021 256 Gm01 33999 0 60M34H * 0 0 * * NM:i:1 MD:Z:45T14 AS:i:55 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:1101:3451:56376 256 Gm01 33999 0 59M35H * 0 0 * * NM:i:2 MD:Z:45T5A7 AS:i:49 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:2106:19619:31878 256 Gm01 33999 0 59M35H * 0 0 * * NM:i:2 MD:Z:45T5A7 AS:i:49 RG:Z:PI438460
Hello tanya.copley!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=200869#post200869
This is typically not recommended as it runs the risk of annoying people in both communities.