Dear All,
When I use bam-readcount to count each SNP position's depth for making %VAF, Especially for those pair samples, only happened mutation in one sample but not in other paired sample, my final goal is for calculate BAF, sciclone input file or %VAF.
Here is the code:
#Step one:
#Extract SNP site info
cat $1.vcf | sed '/^#/d' | awk '{print $1"\t"$2"\t"$2}' > $1.bed
cat $2.vcf | sed '/^#/d' | awk '{print $1"\t"$2"\t"$2}' > $2.bed
#Step two:
#Example:
cat $1.bed $2.bed > variants.positions_sorted.bed
sort -k1,1 -k2,2n -k3,3n -u variants.positions_sorted.bed > variants.positions_sorted.bed
#Step three:
#Example:
bam-readcount -b 15 -q 15 -w 1 \
-f reference/Mus_musculus.fa \
-l variants.positions_sorted.bed \
/bam_files/$1.bam \
> $1_variants.readcounts
I'm following what Chris mentioned in this post enter link description here, I got bam-readcount output file now like this:
GL456392.1 772 C 123 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 C:103:30.22:36.83:1.46:46:57:0.57:0.06:145.63:46:0.46:64.46:0.33 G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 T:20:31.10:37.00:0.00:11:9:0.78:0.09:182.55:11:0.59:55.90:0.34 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
GL456378.1 780 T 2 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:2:33.00:37.00:0.00:1:1:0.32:0.04:129.50:1:0.71:100.00:0.38 C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
However, I search from google and here, I didn't find any comment for this purpose. I don't know which tool can help me to convert bam-readcount output to VAF tab-delimited like this:
chrom POS REF ALT REF_DP ALT_DP TOTAL_DP
GL456378.1 772 C A 103 20 123
GL456378.1 780 T A 0 2 2
1 3000004 T A,G 15 14,2 31
Thanks in advance, Haitao
Wow, you need to sleep now : ), thanks for your answer so fast! Haitao
Hi Chris,
Thanks for your advice, however, I'm not very familiar with perl, I can understand the script but very hard to write script by myself until now, I try to write script by myself but need some time, my level is application level at this moment. I'm work on bench before(~10 years), recent one year I have shifted a a little bit from bench to analysis, now is 50 / 50.
Can I use GMS to figure out my issue? I installed genome music and genome music2 before.
Thanks, Haitao
Try this simpler code that I also had laying around: https://github.com/chrisamiller/bioinfoSnippets/blob/master/add-readcounts.pl
Thanks I will try this!