How to introduce artificial mutation in bam
3
1
Entering edit mode
5.1 years ago
bassanio ▴ 60

Hi,

Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?

bam Mismatch • 4.1k views
ADD COMMENT
0
Entering edit mode

Introducing a mutation in a bam, what does that mean? Change all the sequenced nucleotides at that position? Change the reference genome at that position?

ADD REPLY
0
Entering edit mode

yes change the sequenced nucleotide not the reference

ADD REPLY
6
Entering edit mode
5.1 years ago

I quickly wrote: http://lindenb.github.io/jvarkit/Biostar404363.html

$ samtools view src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   TAATTAAGTCTACAGAGCAACTA ??????????????????????? RG:Z:gid1

$ cat jeter.vcf
ref2    14  A   C

$ java -jar dist/biostar404363.jar -p jeter.vcf src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   CAATTAAGTCTACAGAGCAACTA ??????????????????????? PG:Z:0  RG:Z:gid1   NM:i:1
ADD COMMENT
0
Entering edit mode

Hi Thanks, It worked for me. Just to add Is there a way we can make them heterozygote too?

ADD REPLY
1
Entering edit mode

sure, I changed the input: it now takes a VCF file, use the field INFO/AF=0.5 to get heterozygote changes (see the example below). http://lindenb.github.io/jvarkit/Biostar404363.html

# look at the original bam at position 14:79838836
$ samtools tview -d T -p 14:79838836 src/test/resources/HG02260.transloc.chr9.14.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
cataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
           taaaggcaa

# look at the VCF , we want to change 14:79838836 with 'A' with probability = 0.5
$ cat jeter.vcf
##fileformat=VCFv4.2
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
14  79838836    .   N   A   .   .   AF=0.5

# apply biostar404363
$  java -jar dist/biostar404363.jar -o jeter.bam -p jeter.vcf src/test/resources/HG02260.transloc.chr9.14.bam

# index the sequence
$ samtools index jeter.bam

# view the result (half of the bases were replaced because AF=0.5 in the VCF)
$ samtools tview -d T -p 14:79838836 jeter.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
MATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
aataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
aataggaaaactaaaggcaa
AATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
           taaaggcaa
ADD REPLY
1
Entering edit mode

Thank you. It works too well.

ADD REPLY
1
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

will this work with forward and reverse reads in an aligned bam file?

ADD REPLY
0
Entering edit mode

In a BAM, the sequence is always genomic 5'->3' whatever is the strand.

ADD REPLY
0
Entering edit mode

Thank you ! It seems to work very well for SNP, I was wondering if it also works for deletions and insertions?

ADD REPLY
0
Entering edit mode

I was wondering if it also works for deletions and insertions?

no, it doesn't

ADD REPLY
1
Entering edit mode
5.1 years ago
JC 13k

I am not aware of any tool that can do this, but in general terms, you can:

  1. Sort your BAM file by positions
  2. Decompress the file to SAM
  3. Use a Python/Perl script to look for the coordinate, you need to consider that the reference position is marked for the first base of the alignment, be sure to adjust position based on the CIGAR.
  4. Convert back to BAM
ADD COMMENT
1
Entering edit mode
5.1 years ago
h.mon 35k

Maybe BAMSurgeon can do what you want. It has a -v flag to add SNVs to specific regions.

-v  VARFILENAME, --varfile VARFILENAME
    Target regions to try and add a SNV, as BED

Another option would be to simulate the reads with NEAT-GenReads, it also has a -v parameter to add variants to specific positions:

-v  Input VCF file. Variants from this VCF will be inserted into the simulated sequence with 100% certainty.

But in this case you would have to map the reads again.

edit: the thread NGS data simulation: VarSim or BAMSurgeon? may be of interest.

ADD COMMENT

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6