Hello Everyone,
Can any one tell me that what is difference between split reads and discordant reads. From my alignment file, I separated split alignments and discordant alignment but I can not figure out what the exactly result are showing. I am showing one alignment from each file:
DISCORDANT alignments:
H1:1:H5GCTBCXY:1:1107:1197:2139 97 Chr00 27202666 0 7S139M5S Chr03 36179032 0 ATGTCGGTTTTACAAAATCTGTCGTCTGATATGGACTTATGAATTAG
TTCAGATGGCGCTTAAGGACAAAACCGTTGTGTGATGCATAAAAAAATGAGCGGCAAGTTTCCCACCATGTAAGTGGTGCCAAACCCCATCTCAAACCCCAAAT DDDDDHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIHIIHHIIIFHHIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIHIHFHIIGIHIIIIIH NM:i:6 MD:Z:20A25A4T23C2T7T52 AS:i:109 XS:i:108
H1:1:H5GCTBCXY:1:1107:1197:2139 145 Chr03 36179032 17 12S28M4D111M Chr00 27202666 0 TTGAAACCTTAGTCCTCTCTTCTCTCTCCGACAACCAAACTTCTGCC
TCATAGGATTGATGGTGAATTCATCCCATCACTCTCGCTTCAAAACAGACCAGAGAGAGCCCTAGATTTGAAACCCTCTCTTCTCCAATTCTTCGAGTCCTCTT HDC0HHHIHH?IIIIGIIHHHIHHDHCIHGGHHFHHHFFHEHHHHHHCF<HIHHI
IIIIHIIIIIIIIIIIIHHEEIHHHIHIIIIIIHGEIIIIIIIIIIIGIIIIIIIHIIHIIIIHGGC=HIIIIIIIHIHHCHIHIHIIIIIDDDDD NM:i:11 MD:Z:8C5C13^CAGT43A13A0G0C15G35 AS:i:94 XS:i:84 XA:Z:Ch
r05,+46698740,111M4D28M12S,13;Chr06,-30674967,12S28M4D111M,13;Chr07,+28088165,42M1I68M4D28M12S,13;Chr04,-19106513,15S25M4D111M,13;Chr07,+49817886,42M1I68M4D28M12S,14;
SPLIT:
B1:1:H5GCTBCXY:1:1107:1334:2188 161 Chr05 26225099 8 82M69S Chr03 1950005 0 AAGAAACTTGATCGAAATAACGAAGACATGGCATTGGAAGATGACAAGGCTTGAAGATGACGC
AACAAGGCCCAAGAACTCTTCTGGATTAATTTGGATTTCGGCAAATGAGAGAAAGCCCATTGGATTTTGGGTTGTTGGACGTAATGGA B@@DBHIIIIIHHHIHHIIIIIIIIIIIIIGHIIIIIIIIIIIIIIIIHHHHHIIIIHEHIEHIICHIIII
HHHHEHHE?C1<1DGEIHHFCHIHIHEHIIDEHHGIGHHHIGGFHIII@EEEHGEHHH?HHIHI0ECHHEHFHHHHIGHC NM:i:2 MD:Z:11C8T61 AS:i:72 XS:i:67 SA:Z:Chr06,27321942,+,67S11M1I66M6S,0,2
; XA:Z:Chr06,+25451428,82M69S,3;Chr04,-24335170,69S82M,3;Chr03,+10998176,81M1I19M50S,7;Chr05,+51873112,82M69S,4;Chr03,+11018865,81M1I19M50S,8;
B1:1:H5GCTBCXY:1:1107:1334:2188 2209 Chr06 27321942 0 67H11M1I66M6H Chr03 1950005 0 AGGCCCAAGAACTCTTCTGGATTAATTTGGATTTCGGCAAATGAGAGAAAGCCCA
TTGGATTTTGGGTTGTTGGACGT IIIIHHHHEHHE?C1<1DGEIHHFCHIHIHEHIIDEHHGIGHHHIGGFHIII@EEEHGEHHH?HHIHI0ECHHEHFHH NM:i:2 MD:Z:48G28 AS:i:65 XS:i:65 SA:Z:Chr05,26225099,+,8
2M69S,8,2; XA:Z:Chr00,+26655585,67S11M1I66M6S,2;Chr07,+37957453,67S11M1I66M6S,3;
Please anyone know about this please explain me.
Thanks & Regards
Thank you Samarth for explaining. I have some more confusion hope you can explain very well. in my mapping data more than 56 percent mapped reads were contributed discordant reads as well as split reads and i want to identify CNVs in genome. When i come to deletions the i am little bit confused because i have read that deletions are characterized by an insert size longer than expected insert size. insert size of my libaray dataset is 1.5 kb so i extracted all the reads which have greater than this insert size. but now i am confused what about those reads where one mate is mapped on one chr and another is other chromosome.
If you saw the output of discordant reads as i mention above. i do not understand that this type of reads will contribute to deletion or other types of CNVs???
please tell me if you know something about that.
Thanks in advance
What kind of library prep and sequencing platform were used for the experiment? 1.5kb is pretty big. Also, if you want to detect structural variants, such as deletions, I highly recommend using established tools, such as VarScan2, Freebayes, VarDict... for SNVs or Lumpy, Pindel, Breakdancer... for larger variants. Do not start with any homebrew solutions like extracting reads of a certain size, followed by custom filtering.
Thank you for your useful suggestion and now i am going to use these software to identify CNVs but before that i just want to clear my concept that which type of reads contribute to which type of event but still i dnt get clear vision but still try.
Deepika, Output mentioned by you seems to have multimapped property (read mapping to multiple locations of the genome: You can check multimapped property by looking at XA tag).
In your first DISCORDANT read: check XA tag (pasted below)
xa:z:ch="" r05,+46698740,111m4d28m12s,13;chr06,-30674967,12s28m4d111m,13;chr07,+28088165,42m1i68m4d28m12s
And in this read : H1:1:H5GCTBCXY:1:1107:1197:2139 97 Chr00 27202666 0 7S139M5S Chr03 36179032 0
one mate is mapping to Chr00 and other maps to Chr03, so this scenario indicates Chromosomal Translocation events, not a Deletion event. MAPQ of this alignment is 0, you should avoid such reads for your CNV/SV study.
For your understanding purpose, you can manually check few of Discordant reads but if you are dealing with genome level study then you should opt for already established SV callers as suggested by @ATPoint
Hope this helps
Thanks
Thank you Samarth, actually i want to clear my concept but there are lots of confusion but now i am going to use software hope so i will get some vision on that.
Thanks
Deepika, You just follow any structural variation (SV) detection method paper, that will give you a clear idea about how to look for SV signatures using Discordant reads. Choose your SV caller carefully.
Best.
Thank you so much Samarth. Can you tell me that what is break point when you search CNVs. i read the paper but its not clear me. if you do not mind and hope its my last question for you and then i am not going to bothering so much.
Thanks
Wait, then what's the difference between a Split Read and a discordant read of the first kind (large genomic distance between the read pairs). They appear the same from this explanation
The split read happens on an individual read that can be separated into sections that map very well to different locations in the reference.
Discordant reads can be a product of this, but not necessarily.
As an example, a structural variant insertion shows how this can be the case.
Taking a look at the first kind of discordant read -
Let's say that concordant reads should map like this
But an element, the poly-A's, are inserted
Now if
RD2
only contains the A's, then we will have discordant reads, but not split.But if
RD2
is "split" with the A's, we would have discordant and split reads