Samtools view behaviour with skipped region from the reference (N in Cigar)
2
0
Entering edit mode
6.1 years ago

I'm trying to use samtools view to get reads falling in a given area

samtools view in.bam 'chr12:113428514-113428567' | grep "NB500938:125:HY3KMBGX3:4:21501:7767:16841"
NB500938:125:HY3KMBGX3:4:21501:7767:16841   99  chr12   113428559   255 151M    =   113428600   169 CATAGTAATCACAATAGTGGATTTTTCCTCTATACCCGACAAAAACCCCAGAGTCTGACTAGAATCACCCCTGGGCAACTCAGACATTATGCCAATTCCTGGTGTCACACAAGAATCAACCATTCAAGTCATTGTTCCACATTCTGTTCCC AAAAAEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEAEEEEEEEEEEEEEEEEEAE/EEEEE/EEEEEEEEEA/EEAEEEE6EEEEE<EAEAEE<EEEEEEEEEEEAEEE/AEA<E6<AEEE<EAAEEEEEEAE/EEEEEEEAAA6< NH:i:1  HI:i:1  AS:i:277    nM:i:0  MD:Z:151
samtools view in.bam 'chr12:113428514-113428567' | grep "NB500938:125:HY3KMBGX3:2:22202:12932:7527"
NB500938:125:HY3KMBGX3:2:22202:12932:7527   99  chr12   113260099   255 138M169544N13M  =   113260153   169730  CCTTCCCACTCTTTCCCCAGGTCACATTCATCGTGCCGGAAGGGAAGTAATCGTGAATCAGGCAGCCGATTATCACTGGGTCACTTGACAGAGCTGGTGGGAGTGTCAGTGGGTAGATGGTGGGATTTCTCGCAGACTCTGAGGAGACGGT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE/EEEEEEEEEEEEEEAEEEEEEEAEEEE/<EEEEEEEEAEEEEEEEEEEEEEAEE/EEEEEEEEEEEEEEEE<<AAEAEEE/EEEEEAAAA6 NH:i:1  HI:i:1  AS:i:273    nM:i:3  MD:Z:95C55

The read name NB500938:125:HY3KMBGX3:4:21501:7767:16841 fall into the position chr12:113428514-113428567

While the read name NB500938:125:HY3KMBGX3:2:22202:12932:7527 fall into a skipped region from the reference (N letter in cigar)

I don't want from Samtools to output this read name, is there an option (I doubt) or an other tool that take care about the skipped regions ?

Samtools is just a check up, actually I use pysam in python but it works the same way...

Thanks

samtools • 1.6k views
ADD COMMENT
0
Entering edit mode

138M169544N13M

so which part of the read would you like to consider ? what is the logic ?

ADD REPLY
0
Entering edit mode

My reads are RNA, this read map on the reference from 113260099 to 113260237(113260099+138), then huge gap (169544N), then map from 113429781(113260099+138+169544) to 113429794(113260099+138+169544+13).

The logic is that this read map from 113260099 to 113260237 and from 113429781 to 113429794

My search is on 'chr12:113428514-113428567', my read does not cover this region but due to the 169544N Samtools output this read

ADD REPLY
0
Entering edit mode

My search is on 'chr12:113428514-113428567', my read does not cover this region but due to the 169544N

i see

ADD REPLY
2
Entering edit mode
6.1 years ago

using samtools and SamJdk: http://lindenb.github.io/jvarkit/SamJdk.html to iterator over the cigar string.

samtools view -u -b in.bam "chr1:12345-23456" |\
java -jar dist/samjdk.jar -e 'if(record.getReadUnmappedFlag() || record.getCigar()==null) return false;final Interval rgn= new Interval(record.getContig(),12345,23456);return record.getAlignmentBlocks(). stream(). map(B->new Interval(record.getContig(),B.getReferenceStart(),B.getReferenceStart()+B.getLength()-1)). anyMatch(I->I.intersects(rgn)) ;'
ADD COMMENT
0
Entering edit mode

Thanks Pierre for this solution and I bet this one is working very well, but I don't want to mess with Java in my python script.

ADD REPLY
0
Entering edit mode

I don't want to take up any of your time, if you please I need an advise here

I will implement this solution in python using this pseudocode

But I disagree with the Soft Clip/Hard Clip event :

  • Case1 ( M/X/=) :
    • start at the specified mapping position, set counter to 1
    • Add 1 to both the counts of the bases from that position and the counter.
    • Move to the next position.
    • Repeat this process till counter is the same as the number associated with the operator.
  • Case2 (N/D):
    • Move the specified mapping position by the number associated with the operator.
  • Case3 (I/S/H/P):
    • Do nothing

Below is my understanding of S/H regarding position on reference sequence

If I got an start alignment position : 25

  • with this Cigar : 10S10M10S, my read is mapped from 35 to 44

  • with this Cigar : 10H10M10H, my read is mapped from 25 to 34

ADD REPLY
1
Entering edit mode

But I disagree with the Soft Clip/Hard Clip event :

I aree the pseudocode because S and H are not part of the alignment. The read->start is AFTER any clip.

ADD REPLY
1
Entering edit mode
6.1 years ago

If someone needs it

#M    BAM_CMATCH    0
#I    BAM_CINS    1
#D    BAM_CDEL    2
#N    BAM_CREF_SKIP    3
#S    BAM_CSOFT_CLIP    4
#H    BAM_CHARD_CLIP    5
#P    BAM_CPAD    6
#=    BAM_CEQUAL    7
#X    BAM_CDIFF    8
#B    BAM_CBACK    9
#NM    NM tag    10

startR1 = 3
endR1 = 0
array_mapped_positions = []
my_test_cigar = [(4,2), (1,3), (2,2), (0,5), (8,2), (3,7), (1,2), (0,4), (5,4)] #2S3I2D5M2X7N2I4M4H
my_test_cigar2 = [(4,2), (0,1), (1,4), (0,2), (2,3), (6,3), (8,2), (1,2), (0,4), (5,3)] #2S1M4I2M3D3P2X2I4M3H
for i in my_test_cigar2:
    if i[0] == 0:
        if endR1 == 0:
            endR1 = startR1 + i[1] - 1
        else:
            endR1 += i[1]
    elif i[0] == 1:
        continue
    elif i[0] == 2:
        if endR1 != 0:
            array_mapped_positions.append(str(startR1)+"-"+str(endR1))
            startR1 = endR1 + i[1] + 1
            endR1 = 0
        else:
            startR1 += i[1]
    elif i[0] == 3:
        if endR1 != 0:
            array_mapped_positions.append(str(startR1)+"-"+str(endR1))
            startR1 = endR1 + i[1] + 1
            endR1 = 0
        else:
            startR1 += i[1]
    elif i[0] == 4:
        continue
    elif i[0] == 5:
        continue
    elif i[0] == 6:
        continue
    elif i[0] == 7:
        if endR1 == 0:
            endR1 = startR1 + i[1] - 1
        else:
            endR1 += i[1]
    elif i[0] == 8:
        if endR1 == 0:
            endR1 = startR1 + i[1] - 1
        else:
            endR1 += i[1]
    else:
        print("Error CIGAR : "+i[0]+":"+i[1])
        sys.exit()
if endR1 != 0:
    array_mapped_positions.append(str(startR1)+"-"+str(endR1))
print(array_mapped_positions)
#['3-5', '9-14']
ADD COMMENT

Login before adding your answer.

Traffic: 1495 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6