Get positions in BAM with reads supporting at least 2 different nucleotides
1
0
Entering edit mode
6.2 years ago
Roman Hillje ▴ 100

Hi guys,

I'm currently looking for a tool or script that takes a BAM file (from an RNAseq experiment) and outputs a list of all the positions that have reads supporting at least 2 different nucleotids (WT vs variant or whatever you want to call it). Ideally, it also reports the coverage so I can filter it later. Does anybody have an idea how to do this? I don't really care about the base call quality at the positions at this point.

Cheers, Roman

RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

Could you give an example of the desired output?

ADD REPLY
0
Entering edit mode

So you want every location where a single read differs from the consensus? The vast majority of the loci you identify will not be real differences, but will be random errors in the reads.

ADD REPLY
0
Entering edit mode
6.2 years ago

using samfixcigar to convert the cigar string to identify the mismatches. http://lindenb.github.io/jvarkit/SamFixCigar.html

and the samjdk to filter the reads containing a least two cigar elements 'X' or 'I' or 'D'. http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samfixcigar.jar -R src/test/resources/rotavirus_rf.fa src/test/resources/S1.bam |\
java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getCigar().getCigarElements().stream().map(C->C.getOperator().name()).filter(OP->OP.equals("X") || OP.equals("I") || OP.equals("D")).count()>1;'
@HD VN:1.5  SO:coordinate
@SQ SN:RF01 LN:3302
@SQ SN:RF02 LN:2687
@SQ SN:RF03 LN:2592
@SQ SN:RF04 LN:2362
@SQ SN:RF05 LN:1579
@SQ SN:RF06 LN:1356
@SQ SN:RF07 LN:1074
@SQ SN:RF08 LN:1059
@SQ SN:RF09 LN:1062
@SQ SN:RF10 LN:751
@SQ SN:RF11 LN:666
@RG ID:S1   SM:S1   LB:L1   CN:Nantes
@CO Processed with samfixcigar compilation: 2018-09-28:14-09-50 githash: 1919e185cb0c676aa912bd7c2b4cf9d1113f3927 htsjdk: 2.15.0
RF01_27_590_3:0:0_1:0:0_68  163 RF01    27  60  4=1X14=1X12=1X37=   =   521 564 GTATCATCTAATCTTGTCATAATATTTATCATATATATATAACTCACAATCCGCAGTTCAAATTCCAATA  2222222222222222222222222222222222222222222222222222222222222222222222  RG:Z:S1 NM:i:3  AS:i:55 XS:i:0
RF01_110_504_2:0:0_1:0:0_5d 163 RF01    110 60  13=1X19=1X36=   =   435 395 ATAGTGAATTAGATAATAGATGTATTGAATTTCCTTCTAAATGCTTAGAAAACTCAAAGAATGGACTATC  2222222222222222222222222222222222222222222222222222222222222222222222RG:Z:S1   NM:i:2  AS:i:60 XS:i:0
RF01_158_624_3:0:0_1:0:0_61 99  RF01    158 60  33=1X21=1X12=1X1=   =   555 467 AAAACTCAAAGAATGGACTATCATTGAAAAAGCACTTTGTTGAATATAGCGATGTAATGGAGAATGCCCC  2222222222222222222222222222222222222222222222222222222222222222222222  RG:Z:S1 NM:i:3  AS:i:58 XS:i:0
RF01_205_734_3:0:0_2:0:0_79 99  RF01    205 60  14=1X27=1X22=1X4=   =   665 530 AGCGATGTTATGGATAATGCCACACTGTTGTCAATATTATCGAACTCTTATGATAAATATAACGCAGTTG  2222222222222222222222222222222222222222222222222222222222222222222222  RG:Z:S1 NM:i:3  AS:i:55 XS:i:0
RF01_295_749_2:0:0_4:0:0_9a 163 RF01    295 60  40=1X20=1X8=    =   680 455 GCAAAAGGTAAGCCGCTAGAAGCAGATTTGACAGTGAATGCGTTGGATTATGAAAATAACACGATAACAT  2222222222222222222222222222222222222222222222222222222222222222222222RG:Z:S1   NM:i:2  AS:i:60 XS:i:0
(...)

. Ideally, it also reports the coverage so I can filter it later.

use any tool giving the coverage.... like GATK DepthOfCoverage

ADD COMMENT

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6