Hello,
I have a bam file from which I'd like to extract reads which contain one specific variant at a specific position. I used :
samtools mpileup -uf ref.fa my.bam -v -r chr7:151490948-151490948
From IGV, I can spot 1681 "A" nucleotide variant (cf. IGV image below). I would like to extract the read containing this variant. The upper command givec back :
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT my.bam
chr7 151490948 . T A,C,<*> 0 . DP=8074;I16=1395,5186,333,1160,1.67816e+06,4.2793e+08,380715,9.70823e+07,45921,2.31924e+06,9838,499526,164525,4.11312e+06,37325,933125;QS=0.82102,0.178825,0.000155298,0;VDB=1.02295e-43;SGB=-0.693147;RPB=0.78422;MQB=0.997427;MQSB=1.49326e-36;BQB=1;MQ0F=0.00606886 PL 0,99,255,255,255,255,255,255,255,255
I can see the A variants but also the C variants. I know the option --output-QNAME
extract the read names. In my case, I want to extract only the "A" variant reads, not the "C" ones.
I tried :
samtools mpileup --output-QNAME -f ref.fa my.bam -r CM000669.2:151490948-151490948 |awk '{print $NF}' | sed 's/,/\n/g' | wc -l
I get 8157 matching reads. Do these reads correspond to the reference and the variant (A or C) alleles?
Any help to extract my 1681 variant "A" reads?
how did you find the variant ? show us the igv screenshot.
you may use python programming for this task. here are some simple steps: open the file, read line by line, add the "if" condition variant you need, store these variant data in a list and print the list. that's all. i hope this help...
this doesn't explain why OP cannot find a read carrying the variant with samtools.
This user seems to be pushing python on multiple threads.
what does that mean ?
are you sure it's 'Chr19' and not 'chr19' or '19' ?
I expect a variant T/A (with 1681 reads matching the A variant) as you in the IGV window. I would like to extract those 1681 reads ID.
I am sure of the nomenclature. I add this an IGV window, for another locus.
???
The IGV window I showed is for another locus which is Chr7:151490948-151490948) (not Chr9:151490946-151490946 as mentionned first)
You are not answering questions: A) show the locus that is relevant, b) confirm that "Chr7" rather than "chr7" is correct.
You're right, it is always because I try to modify a bit my post to do not divulge on what I work. Anyway, I edited the post.
It just makes it harder to help. Do you really think people could threaten your project by seeing a random screenshot not even knowing who you are and who your work for? Whatever, your choice.
Just right click the read in IGV and click "copy read details to clipboard" then just grep for the read name using something like
samtools view your_file.bam | grep 'M00119:58:000000000-JV6N9:1:2116:17863:22678'
This works only for one read right? I need to do it for the 1681 reads..
start with one read... show us that this read is present in the samtools view output...
On IGV, as I showed with the screenshot, I have 1681 "A" variant reads. But when I scroll all the reads, I can't spot any "A" variant . Is that normal? So, it is impossible for me to detect them and get the corresponding "copy read details to clipboard".
see IGV downsampling reads params https://software.broadinstitute.org/software/igv/Preferences .
Thanks for the doc. I tried to dealt with it and the "Preferences" parameters (I also unchecked the "Downsample reads") but I can't get the "A" variant. Other mutations are however visible as you can see on the screenshot....
Why don't you copy the sequence of the read surrounding the A like +/-10 bp and then grep for the sequence instead of the read name e.g. 'AGAAGATA' etc... make sure it's long enough to be unique
I already tried. When I grep +/- 10 bp around the variant, I only get 39 BAM sequences. Which is very much low than the 1681 expecting reads of the IGV screenshot.
I installed the last version of IGV and I am now able to spot the "A" variant on the reads.
About the "copy read details to clipboard" , I can grep the read name in my BAM file. It works for one :
ok i can write you a script to identify these if that's all you need... are you able to use R?
Indeed, I am able to use R.