How to understand the split reads recognized by samblaster
0
0
Entering edit mode
8.2 years ago

Hi there,

I have a pair of reads which I used bwa mem to align the reads to reference genome. Then I'd like to use samblaster to extract the discordant read pairs and split reads.

The command line I used is:

samblaster -i test.sam -o test.out.sam -s test.split.sam -d test.dis.sam

Here I showed the test.sam below. In test.dis.sam, this pair of reads were identified as discordant read pairs. But in the test.split.sam, there was no information. My understanding is the R2 of "chr1_1219848_1220119_1:0:0_1:0:0_4c39b9" should be a split mapped read. Could someone tell me why it was not considered as split reads by samblaster?

Best, Shaojun Xie

@HD     VN:1.3  SO:coordinate
@SQ     SN:chr1 LN:30427671
@SQ     SN:chr2 LN:19698289
@SQ     SN:chr3 LN:23459830
@SQ     SN:chr4 LN:18585056
@SQ     SN:chr5 LN:26975502
@SQ     SN:Mt   LN:366924
@SQ     SN:Pt   LN:154478
@RG     ID:sim_ SM:sim_
@PG     ID:bwa-meth     PN:bwa-meth     VN:0.2.0        CL:"../../software/bwa-meth-0.2.0/bwameth.py --reference ../simu_TEP_ara/tair10.fa sim_1.fastq.gz sim_2.fastq.gz -t 12
chr1_1219848_1220119_1:0:0_1:0:0_4c39b9 673     chr1    1219848 60      100M    =       1220020 228     GCTCATTCTTTCGTCAGTCTCACCAAATTTAAAGCATAAAGAAATTTCCCTCTTCTCTCCTAAGGAATCAAATGAGTACGAACAAAGTGGGCAAAGCTTT    2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222    NM:i:1  MD:Z:40C59      AS:i:97 XS:i:24 YC:Z:GA YD:Z:r  RG:Z:sim_
chr1_1219848_1220119_1:0:0_1:0:0_4c39b9 593     chr1    1220020 60      56M44S  =       1219848 -228    TTATTATATTATTTTGATTTTGTTTCAGTCTTCCACTTAAGTCTTGTTTTCACAACGTTGTGGTATTTGACAATTTTTTTGTCTTTTAACTTATTACACA    2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222    NM:i:0  MD:Z:56 AS:i:56 XS:i:22 SA:Z:rchr1,23711253,-,59S41M,28,0;      YC:Z:CT YD:Z:r  RG:Z:sim_
chr1_1219848_1220119_1:0:0_1:0:0_4c39b9 849     chr1    23711253        1       59H41M  =       1219848 -22491446       GTGGTATTTGACAATTTTTTTGTCTTTTAACTTATTACACA       22222222222222222222222222222222222222222       NM:i:0  MD:Z:41 AS:i:41 XS:i:35 SA:Z:rchr1,1220020,-,56M44S,60,0;       XA:Z:fchr1,+11710711,41M59S,2;fchr4,+6428740,41M59S,2;fchr1,+29983385,41M59S,2;rchr2,-16380751,59S13M1D28M,1;rchr1,-28828558,59S13M1D28M,1;     YC:Z:CT YD:Z:r  RG:Z:sim_
alignment samblaster bwa split read mapping • 3.3k views
ADD COMMENT
1
Entering edit mode

The split reads are generally identified from the CIGAR string and the discordant read pairs are identified from the orientation of mapped read pair ( from FLAG ) and the insert size ( column 9 )

Edit: I realized this is not true in case of SV or samblaster.

They have clearly defined in their website:

A discordant read pair is one which meets all of the following criteria:

1. Both side of the read pair are mapped (neither FLAG 0x4 or 0x8 is set).
2. The properly paired FLAG (0x2) is not set.

samblaster uses the following strategy to identify split reads alignments.

1. Identify reads that have between two and --maxSplitCount primary and supplemental alignments.
2. Sort these alignments by their strand-normalized position along the read.
3. Two alignments are output as splitters if they are adjacent on the read, and meet these criteria:
    a. each covers at least --minNonOverlap base pairs of the read that the other does not.
    b. the two alignments map to different reference sequences and/or strands.
    c. the two alignments map to the same sequence and strand, and represent a SV that is at least --minIndelSize in length, and have at most --maxUnmappedBases of un-aligned base pairs between them.
4. Split read alignments that are part of a duplicate read will be output unless the -e option is used.

So none of the reads have the "properly paired flag" ( There is not FLAG 2 ). And you can check if the thos reads map the criteria given to be called as split reads.

ADD REPLY

Login before adding your answer.

Traffic: 2853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6