Hi,
Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?
Hi,
Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?
I quickly wrote: http://lindenb.github.io/jvarkit/Biostar404363.html
$ samtools view src/test/resources/toy.bam | tail -1
x6 0 ref2 14 30 23M * 0 0 TAATTAAGTCTACAGAGCAACTA ??????????????????????? RG:Z:gid1
$ cat jeter.vcf
ref2 14 A C
$ java -jar dist/biostar404363.jar -p jeter.vcf src/test/resources/toy.bam | tail -1
x6 0 ref2 14 30 23M * 0 0 CAATTAAGTCTACAGAGCAACTA ??????????????????????? PG:Z:0 RG:Z:gid1 NM:i:1
sure, I changed the input: it now takes a VCF file, use the field INFO/AF=0.5 to get heterozygote changes (see the example below). http://lindenb.github.io/jvarkit/Biostar404363.html
# look at the original bam at position 14:79838836
$ samtools tview -d T -p 14:79838836 src/test/resources/HG02260.transloc.chr9.14.bam | cut -c 1-20
79838841 79838
NNNNNNNNNNNNNNNNNNNN
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
cataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
taaaggcaa
# look at the VCF , we want to change 14:79838836 with 'A' with probability = 0.5
$ cat jeter.vcf
##fileformat=VCFv4.2
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
#CHROM POS ID REF ALT QUAL FILTER INFO
14 79838836 . N A . . AF=0.5
# apply biostar404363
$ java -jar dist/biostar404363.jar -o jeter.bam -p jeter.vcf src/test/resources/HG02260.transloc.chr9.14.bam
# index the sequence
$ samtools index jeter.bam
# view the result (half of the bases were replaced because AF=0.5 in the VCF)
$ samtools tview -d T -p 14:79838836 jeter.bam | cut -c 1-20
79838841 79838
NNNNNNNNNNNNNNNNNNNN
MATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
aataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
aataggaaaactaaaggcaa
AATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
taaaggcaa
I am not aware of any tool that can do this, but in general terms, you can:
Maybe BAMSurgeon can do what you want. It has a -v
flag to add SNVs to specific regions.
-v VARFILENAME, --varfile VARFILENAME
Target regions to try and add a SNV, as BED
Another option would be to simulate the reads with NEAT-GenReads, it also has a -v
parameter to add variants to specific positions:
-v Input VCF file. Variants from this VCF will be inserted into the simulated sequence with 100% certainty.
But in this case you would have to map the reads again.
edit: the thread NGS data simulation: VarSim or BAMSurgeon? may be of interest.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Introducing a mutation in a bam, what does that mean? Change all the sequenced nucleotides at that position? Change the reference genome at that position?
yes change the sequenced nucleotide not the reference