Hello,
I have a variant of interest at position chrX:124068651-G-T with a DP value of 319. Here, it is
chrX 124068651 . G T 200.121 . DP=319;VDB=2.88188e-20;SGB=-0.693147;RPBZ=-2.38326;MQBZ=0.636315;BQBZ=1.07463;SCBZ=-0.654487;MQ0F=0.0125392;AC=1;AN=2;DP4=122,0,197,0;MQ=58 GT:PL 0/1:233,0,169
I have the sorted, indexed bam file from which I have called the variant. Now, it is essential for me to know the name of the reads which support the particular variant at this position.
For this I use sam2tsv (http://lindenb.github.io/jvarkit/Sam2Tsv.html) to find the reads which have the alternate allele at the position. An example output is as follows:
> #Read-Name Flag MAPQ CHROM READ-POS0 READ-BASE READ-QUAL REF-POS1 REF-BASE CIGAR-OP
> LH00587:73:22HGLTLT4:5:1102:26223:9839 0 60 chrX 91 T 9 124068651 G M
And there are such 18683080 entries. It is not possible to have so many entries because we have performed nested pcr to amplify a mutation and then amplicon sequenced it,and the number of mutations cannot be so high.
How do I get the DP=319 which will allow me to just extract the read names of those 319 reads supporting alt allele as shown in the above vcf entries??
Best
I wrote this tool, how did you use it please ?
usage could be
otherwise, you could just run:
reg.bed looks this way: chrX 124068651 124068651
it cannot work anyway, you should get an error message:
[WARN][BedLineCodec]cannot use empty BED interval where start==end. chrX 124068651 124068651 chrX(tab)124068651(tab)124068651. Skipping.
BED files are half-open intervals.
jup you are right sorry the updated bed file looks like this:
chrX 124068649 124068651
I then grep "124068651" res.txt > res_filt.txt
please try my samtools command above (fixed a few minutes ago, missing -h)
yes! I am running it now
but your second suggestion: samtools view in.bam "chrX:124068651-124068651" | cut -f1 will give me all the reads,right? not the one which have the G to T substitution.
Okay so I used the command which you mentioned earlier, since I only want the reads which have a G to T substitution, I parsed the output so that: READ-BASE is T and REF-BASE is G, I still have 17879217 entries
very strange. Can you please , upload the file below in a new issue : https://github.com/lindenb/jvarkit/issues