Find the reads which correspond to DP of vcf
1
0
Entering edit mode
2 days ago
totoroGirl • 0

Hello,

I have a variant of interest at position chrX:124068651-G-T with a DP value of 319. Here, it is

chrX 124068651 . G T 200.121 . DP=319;VDB=2.88188e-20;SGB=-0.693147;RPBZ=-2.38326;MQBZ=0.636315;BQBZ=1.07463;SCBZ=-0.654487;MQ0F=0.0125392;AC=1;AN=2;DP4=122,0,197,0;MQ=58 GT:PL 0/1:233,0,169

I have the sorted, indexed bam file from which I have called the variant. Now, it is essential for me to know the name of the reads which support the particular variant at this position.

For this I use sam2tsv (http://lindenb.github.io/jvarkit/Sam2Tsv.html) to find the reads which have the alternate allele at the position. An example output is as follows:

> #Read-Name    Flag    MAPQ    CHROM   READ-POS0   READ-BASE   READ-QUAL   REF-POS1    REF-BASE    CIGAR-OP
> LH00587:73:22HGLTLT4:5:1102:26223:9839        0       60      chrX  91 T     9     124068651  G     M

And there are such 18683080 entries. It is not possible to have so many entries because we have performed nested pcr to amplify a mutation and then amplicon sequenced it,and the number of mutations cannot be so high.

How do I get the DP=319 which will allow me to just extract the read names of those 319 reads supporting alt allele as shown in the above vcf entries??

Best

bam variant vcf sam2tsv • 373 views
ADD COMMENT
0
Entering edit mode

And there are such 18683080 entries. It is not possible to have so many entries because we have performed nest

I wrote this tool, how did you use it please ?

usage could be

samtools view -h in.bam "chrX:124068651-124068651" | java -jar jvarkit.jar sam2tsv -R ref.fa | awk '$8==124068651'

otherwise, you could just run:

samtools view in.bam "chrX:124068651-124068651" | cut -f1 
ADD REPLY
0
Entering edit mode
java -jar /mnt/beegfs/sbhattacharya/clone_tracing/bam_file/M1/jvarkit/dist/jvarkit.jar sam2tsv -R GCA_000001405.15_GRCh38_no_alt_analysis_set.fna  --regions reg.bed input_sam2tsv_sorted.bam res.txt 

reg.bed looks this way: chrX 124068651 124068651

ADD REPLY
0
Entering edit mode

reg.bed looks this way: chrX 124068651 124068651

it cannot work anyway, you should get an error message:

[WARN][BedLineCodec]cannot use empty BED interval where start==end. chrX 124068651 124068651 chrX(tab)124068651(tab)124068651. Skipping.

BED files are half-open intervals.

ADD REPLY
0
Entering edit mode

jup you are right sorry the updated bed file looks like this:

chrX 124068649 124068651

I then grep "124068651" res.txt > res_filt.txt

ADD REPLY
0
Entering edit mode

please try my samtools command above (fixed a few minutes ago, missing -h)

ADD REPLY
0
Entering edit mode

yes! I am running it now

but your second suggestion: samtools view in.bam "chrX:124068651-124068651" | cut -f1 will give me all the reads,right? not the one which have the G to T substitution.

ADD REPLY
0
Entering edit mode

Okay so I used the command which you mentioned earlier, since I only want the reads which have a G to T substitution, I parsed the output so that: READ-BASE is T and REF-BASE is G, I still have 17879217 entries

ADD REPLY
0
Entering edit mode

very strange. Can you please , upload the file below in a new issue : https://github.com/lindenb/jvarkit/issues

samtools view -h in.bam "chrX:124068651-124068651"  > issues.sam.txt
ADD REPLY
0
Entering edit mode
1 day ago

answer here: https://github.com/lindenb/jvarkit/issues/260

the depth was huge:

$ samtools depth -r 'chrX:124068651-124068651' jeter.bam  | awk -F '\t' '$2==124068651'
chrX    124068651   25534522

and the vcf caller most probably downsampled the reads.

ADD COMMENT

Login before adding your answer.

Traffic: 1252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6