Question

Regarding Split reads and discordant reads

1

Entering edit mode

7.6 years ago

DL ▴ 50

Hello Everyone,

Can any one tell me that what is difference between split reads and discordant reads. From my alignment file, I separated split alignments and discordant alignment but I can not figure out what the exactly result are showing. I am showing one alignment from each file:

DISCORDANT alignments:

H1:1:H5GCTBCXY:1:1107:1197:2139 97      Chr00   27202666        0       7S139M5S        Chr03   36179032        0       ATGTCGGTTTTACAAAATCTGTCGTCTGATATGGACTTATGAATTAG
TTCAGATGGCGCTTAAGGACAAAACCGTTGTGTGATGCATAAAAAAATGAGCGGCAAGTTTCCCACCATGTAAGTGGTGCCAAACCCCATCTCAAACCCCAAAT        DDDDDHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIHIIHHIIIFHHIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIHIHFHIIGIHIIIIIH        NM:i:6  MD:Z:20A25A4T23C2T7T52  AS:i:109        XS:i:108
H1:1:H5GCTBCXY:1:1107:1197:2139 145     Chr03   36179032        17      12S28M4D111M    Chr00   27202666        0       TTGAAACCTTAGTCCTCTCTTCTCTCTCCGACAACCAAACTTCTGCC
TCATAGGATTGATGGTGAATTCATCCCATCACTCTCGCTTCAAAACAGACCAGAGAGAGCCCTAGATTTGAAACCCTCTCTTCTCCAATTCTTCGAGTCCTCTT        HDC0HHHIHH?IIIIGIIHHHIHHDHCIHGGHHFHHHFFHEHHHHHHCF<HIHHI
IIIIHIIIIIIIIIIIIHHEEIHHHIHIIIIIIHGEIIIIIIIIIIIGIIIIIIIHIIHIIIIHGGC=HIIIIIIIHIHHCHIHIHIIIIIDDDDD        NM:i:11 MD:Z:8C5C13^CAGT43A13A0G0C15G35 AS:i:94 XS:i:84 XA:Z:Ch
r05,+46698740,111M4D28M12S,13;Chr06,-30674967,12S28M4D111M,13;Chr07,+28088165,42M1I68M4D28M12S,13;Chr04,-19106513,15S25M4D111M,13;Chr07,+49817886,42M1I68M4D28M12S,14;

SPLIT:

B1:1:H5GCTBCXY:1:1107:1334:2188 161     Chr05   26225099        8       82M69S  Chr03   1950005 0       AAGAAACTTGATCGAAATAACGAAGACATGGCATTGGAAGATGACAAGGCTTGAAGATGACGC
AACAAGGCCCAAGAACTCTTCTGGATTAATTTGGATTTCGGCAAATGAGAGAAAGCCCATTGGATTTTGGGTTGTTGGACGTAATGGA        B@@DBHIIIIIHHHIHHIIIIIIIIIIIIIGHIIIIIIIIIIIIIIIIHHHHHIIIIHEHIEHIICHIIII
HHHHEHHE?C1<1DGEIHHFCHIHIHEHIIDEHHGIGHHHIGGFHIII@EEEHGEHHH?HHIHI0ECHHEHFHHHHIGHC        NM:i:2  MD:Z:11C8T61    AS:i:72 XS:i:67 SA:Z:Chr06,27321942,+,67S11M1I66M6S,0,2
;       XA:Z:Chr06,+25451428,82M69S,3;Chr04,-24335170,69S82M,3;Chr03,+10998176,81M1I19M50S,7;Chr05,+51873112,82M69S,4;Chr03,+11018865,81M1I19M50S,8;
B1:1:H5GCTBCXY:1:1107:1334:2188 2209    Chr06   27321942        0       67H11M1I66M6H   Chr03   1950005 0       AGGCCCAAGAACTCTTCTGGATTAATTTGGATTTCGGCAAATGAGAGAAAGCCCA
TTGGATTTTGGGTTGTTGGACGT IIIIHHHHEHHE?C1<1DGEIHHFCHIHIHEHIIDEHHGIGHHHIGGFHIII@EEEHGEHHH?HHIHI0ECHHEHFHH  NM:i:2  MD:Z:48G28      AS:i:65 XS:i:65 SA:Z:Chr05,26225099,+,8
2M69S,8,2;      XA:Z:Chr00,+26655585,67S11M1I66M6S,2;Chr07,+37957453,67S11M1I66M6S,3;

Please anyone know about this please explain me.

Thanks & Regards

genome sequence sequencing alignment • 22k views

ADD COMMENT • link updated 24 months ago by Ram 45k • written 7.6 years ago by DL ▴ 50

score 17 · Answer 1 · 2017-10-17

17

Entering edit mode

7.6 years ago

Samarth Kulshrestha ▴ 300

For Paired-end data generated by high-throughput sequencers, there are two different types of read categories:

1) Concordant reads (Properly aligned reads)
2) Discordant reads (improperly aligned reads: important to identify genome alteration events)

Concordant reads: have span size within the range of expected fragment size and consistent orientation of read pairs with respect to reference.

Discordant reads: have unexpected span size/inconsistent orientation of read pairs.

Example:

Concordant Reads** - R1----->Expected Mapping distance and orientation<-----R2

Discordant reads: Discordant reads have different categories

A) Based on mapping Distance

R1-----> Unexpected mapping distance <-----R2

B) Based on read orientation (Expected read orientation for Paired-end data should be R1 (Forward) R2 (Reverse): FR orientation, but in case of discordant reads, orientations are either FF or RR)

R1-----> R2-----> [FF orientation]
R1<----- R2<---- [RR orientation]

Split reads: When one portion of an NGS read map to one location and other portion of the same read map to a different location of a genome. When the read both the portions is of equal length, this is called a balanced split.

ADD COMMENT • link 7.6 years ago by Samarth Kulshrestha ▴ 300

0

Entering edit mode

Thank you Samarth for explaining. I have some more confusion hope you can explain very well. in my mapping data more than 56 percent mapped reads were contributed discordant reads as well as split reads and i want to identify CNVs in genome. When i come to deletions the i am little bit confused because i have read that deletions are characterized by an insert size longer than expected insert size. insert size of my libaray dataset is 1.5 kb so i extracted all the reads which have greater than this insert size. but now i am confused what about those reads where one mate is mapped on one chr and another is other chromosome.

If you saw the output of discordant reads as i mention above. i do not understand that this type of reads will contribute to deletion or other types of CNVs???

please tell me if you know something about that.

Thanks in advance

ADD REPLY • link 7.6 years ago by DL ▴ 50

0

Entering edit mode

insert size of my libaray dataset is 1.5 kb

What kind of library prep and sequencing platform were used for the experiment? 1.5kb is pretty big. Also, if you want to detect structural variants, such as deletions, I highly recommend using established tools, such as VarScan2, Freebayes, VarDict... for SNVs or Lumpy, Pindel, Breakdancer... for larger variants. Do not start with any homebrew solutions like extracting reads of a certain size, followed by custom filtering.

ADD REPLY • link 7.6 years ago by ATpoint 88k

0

Entering edit mode

Thank you for your useful suggestion and now i am going to use these software to identify CNVs but before that i just want to clear my concept that which type of reads contribute to which type of event but still i dnt get clear vision but still try.

ADD REPLY • link 7.6 years ago by DL ▴ 50

0

Entering edit mode

Deepika, Output mentioned by you seems to have multimapped property (read mapping to multiple locations of the genome: You can check multimapped property by looking at XA tag).

In your first DISCORDANT read: check XA tag (pasted below)

xa:z:ch="" r05,+46698740,111m4d28m12s,13;chr06,-30674967,12s28m4d111m,13;chr07,+28088165,42m1i68m4d28m12s

And in this read : H1:1:H5GCTBCXY:1:1107:1197:2139 97 Chr00 27202666 0 7S139M5S Chr03 36179032 0

one mate is mapping to Chr00 and other maps to Chr03, so this scenario indicates Chromosomal Translocation events, not a Deletion event. MAPQ of this alignment is 0, you should avoid such reads for your CNV/SV study.

For your understanding purpose, you can manually check few of Discordant reads but if you are dealing with genome level study then you should opt for already established SV callers as suggested by @ATPoint

Hope this helps

Thanks

ADD REPLY • link 7.6 years ago by Samarth Kulshrestha ▴ 300

0

Entering edit mode

Thank you Samarth, actually i want to clear my concept but there are lots of confusion but now i am going to use software hope so i will get some vision on that.

Thanks

ADD REPLY • link 7.6 years ago by DL ▴ 50

0

Entering edit mode

Deepika, You just follow any structural variation (SV) detection method paper, that will give you a clear idea about how to look for SV signatures using Discordant reads. Choose your SV caller carefully.

Best.

ADD REPLY • link 7.6 years ago by Samarth Kulshrestha ▴ 300

0

Entering edit mode

Thank you so much Samarth. Can you tell me that what is break point when you search CNVs. i read the paper but its not clear me. if you do not mind and hope its my last question for you and then i am not going to bothering so much.

Thanks

ADD REPLY • link 7.6 years ago by DL ▴ 50

0

Entering edit mode

Wait, then what's the difference between a Split Read and a discordant read of the first kind (large genomic distance between the read pairs). They appear the same from this explanation

ADD REPLY • link 5.3 years ago by c_u ▴ 530

1

Entering edit mode

The split read happens on an individual read that can be separated into sections that map very well to different locations in the reference.

Discordant reads can be a product of this, but not necessarily.

As an example, a structural variant insertion shows how this can be the case.

Taking a look at the first kind of discordant read -

R1-----> Unexpected mapping distance <-----R2

Let's say that concordant reads should map like this

REF: GATTACA[........]ACATTAG[-- LONG REGION --]AAAAAAAA
RD1: GATTACA->
RD2:                <-ACATTAG

But an element, the poly-A's, are inserted

REF: GATTACAAAAAAAAAACATTAG[-- LONG REGION --]AAAAAAAA

Now if RD2 only contains the A's, then we will have discordant reads, but not split.

REF: GATTACAAAAAAAAAACATTAG[-- LONG REGION --]AAAAAAAA
RD1: GATTACA->
RD2:      <-AAAAAAAA                       
MAP:                                        <-AAAAAAAA

But if RD2 is "split" with the A's, we would have discordant and split reads

REF: GATTACAAAAAAAAAACATTAG[-- LONG REGION --]AAAAAAAA
RD1: GATTACA->
RD2:          <-AAAAAACA
MAP:               <-ACA                    <-AAAAA

ADD REPLY • link 3.4 years ago by DavidStreid ▴ 90