Entering edit mode
8.2 years ago
xieshaojun0621
▴
210
Hi there,
I have a pair of reads which I used bwa mem to align the reads to reference genome. Then I'd like to use samblaster to extract the discordant read pairs and split reads.
The command line I used is:
samblaster -i test.sam -o test.out.sam -s test.split.sam -d test.dis.sam
Here I showed the test.sam below. In test.dis.sam, this pair of reads were identified as discordant read pairs. But in the test.split.sam, there was no information. My understanding is the R2 of "chr1_1219848_1220119_1:0:0_1:0:0_4c39b9" should be a split mapped read. Could someone tell me why it was not considered as split reads by samblaster?
Best, Shaojun Xie
@HD VN:1.3 SO:coordinate
@SQ SN:chr1 LN:30427671
@SQ SN:chr2 LN:19698289
@SQ SN:chr3 LN:23459830
@SQ SN:chr4 LN:18585056
@SQ SN:chr5 LN:26975502
@SQ SN:Mt LN:366924
@SQ SN:Pt LN:154478
@RG ID:sim_ SM:sim_
@PG ID:bwa-meth PN:bwa-meth VN:0.2.0 CL:"../../software/bwa-meth-0.2.0/bwameth.py --reference ../simu_TEP_ara/tair10.fa sim_1.fastq.gz sim_2.fastq.gz -t 12
chr1_1219848_1220119_1:0:0_1:0:0_4c39b9 673 chr1 1219848 60 100M = 1220020 228 GCTCATTCTTTCGTCAGTCTCACCAAATTTAAAGCATAAAGAAATTTCCCTCTTCTCTCCTAAGGAATCAAATGAGTACGAACAAAGTGGGCAAAGCTTT 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:40C59 AS:i:97 XS:i:24 YC:Z:GA YD:Z:r RG:Z:sim_
chr1_1219848_1220119_1:0:0_1:0:0_4c39b9 593 chr1 1220020 60 56M44S = 1219848 -228 TTATTATATTATTTTGATTTTGTTTCAGTCTTCCACTTAAGTCTTGTTTTCACAACGTTGTGGTATTTGACAATTTTTTTGTCTTTTAACTTATTACACA 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:56 AS:i:56 XS:i:22 SA:Z:rchr1,23711253,-,59S41M,28,0; YC:Z:CT YD:Z:r RG:Z:sim_
chr1_1219848_1220119_1:0:0_1:0:0_4c39b9 849 chr1 23711253 1 59H41M = 1219848 -22491446 GTGGTATTTGACAATTTTTTTGTCTTTTAACTTATTACACA 22222222222222222222222222222222222222222 NM:i:0 MD:Z:41 AS:i:41 XS:i:35 SA:Z:rchr1,1220020,-,56M44S,60,0; XA:Z:fchr1,+11710711,41M59S,2;fchr4,+6428740,41M59S,2;fchr1,+29983385,41M59S,2;rchr2,-16380751,59S13M1D28M,1;rchr1,-28828558,59S13M1D28M,1; YC:Z:CT YD:Z:r RG:Z:sim_
The split reads are generally identified from the CIGAR string and the discordant read pairs are identified from the orientation of mapped read pair ( from FLAG ) and the insert size ( column 9 )
Edit: I realized this is not true in case of SV or samblaster.
They have clearly defined in their website:
So none of the reads have the "properly paired flag" ( There is not FLAG 2 ). And you can check if the thos reads map the criteria given to be called as split reads.