Entering edit mode
8.3 years ago
ThePresident
▴
180
Hi,
I aligned Illumina-generated, paired-end reads to reference sequence using bwa-mem. Now, I would like to extract soft-clipped reads in order to determine boundaries for some potential structural variants. I used samblaster tool with the following command:
bwa mem index R1.fastq R2.fastq | samblaster -u clipped.sam | samtools view -Sb - > clipped.bam | samtools sort - clipped_sorted.bam
However, I get a lot of supposedly soft clipped reads. I wasn't expecting a huge proportion of soft clipping to occur unless this is something common with bwa mem?
Here the output of my clipped SAM file:
@D00780:28:HFJLLBCXX:1:1103:1175:2055_1
GCCATCTTTTCACTACTTGCTCCATATTTTTTGTCTGATTCGGTTGTGTTACTTGAAATGGCATTTGAGTAGTGAATACTTGGGTAGTCGATTCCTAGACCATTTAGGCTGTCTCTTATACACATCTCCGAGCCCACGAGACTCCTGAGCA
+
DD@DDDHHIIIHIIEHHIIIHFHIIIIGHIIGHIIIIGHHHIHHIHHHHIIIIIFGHHIHHIIHHEIHHIIIIIIIIHGGIHIFHHDHIIHIHIIIIHIGIDGHHIHHHEHHIHHIGHHHHHIIIIIIIDDHHHHHIHIHHGHIIIHHFHC
@D00780:28:HFJLLBCXX:1:1103:1175:2055_2
ACGCGTAAGATCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCATCTTTTCACTACTTGCTCCATATTTTTTGTCTGATTCGGTTGTGTTACTTGAAATGGCATTTGAGTAGTGAATACTTGGGTAGTCGATTCCTAGACCATTTAGG
+
.CHEHF@<D<GHGIIIIHIIIIIIHIHIIIIIHHGHHHHHHHHDEF@EC<IIHHDIHHIIIHHHHHHIIIIIIIIIHIIHHIHIIHIIIIGIIIIIIIHIIIIIIHIIIHGIHHIIIIIHHHIHIIIIIIHHIIHHEIHIIHGHHIDDDDD
@D00780:28:HFJLLBCXX:1:1103:1229:2156_1
That's the FASTQ, not SAM.