Question

Missing V600E Mutation In Wholeexome Analysis Of A375 Cell Line? Samtools Problem?

0

Entering edit mode

12.8 years ago

Angel ▴ 220

I have whole-exome data from A375 cell line generated using Illumina HiSeq. So I created the following pipeline to find SNPs and INDELs. strong textIn particular, this cell line should have BRAF V600E mutation which I did not find. Could anyone help me with what went wrong in my pipeline and how to correct it so I can re-discover V600E mutation in BRAF?

bwa aln -t 8 hg19.fa S375_R1.fastq > S375_1.sai
bwa aln -t 8 hg19.fa S375_R2.fastq > S375_2.sai
bwa sampe -a 200 hg19.fa S375_1.sai S375_2.sai S375_R1.fastq S375_R2.fastq > S375_NoIndex_L007.sam
samtools view -bS S375_NoIndex_L007.sam > S375_NoIndex_L007.bam
samtools sort S375_NoIndex_L007.bam S375_NoIndex_L007.sorted
samtools index S375_NoIndex_L007.sorted.bam
samtools mpileup -uf hg19.fa S375_NoIndex_L007.sorted.bam | bcftools view -bvcg - > S375_NoIndex_L007.raw.bcf
bcftools view S375_NoIndex_L007.raw.bcf | vcfutils.pl varFilter -D200 > S375_NoIndex_L007_var_d200.flt.vcf

I only found INTRONS in BRAF which does not make any sense (snpeff result, see below)

Command line: SnpEff eff -v -i vcf -o txt hg19 S375_NoIndex_L007_var_d200.flt.vcf

7       140434607       *       +A      INS     Hom     26.5    169             BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140434607       *       +AA     INS     Hom     26.5    169             BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140437592       G       A       SNP     Hom     13.7    4               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140453315       T       C       SNP     Hom     176     28              BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140470691       C       T       SNP     Hom     21.8    2               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140495573       T       C       SNP     Hom     15.9    2               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140499107       C       T       SNP     Hom     22.8    3               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140510939       A       G       SNP     Het     24      6               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140526728       T       C       SNP     Het     7.8     6               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140529784       A       G       SNP     Hom     13      2               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140532269       G       C       SNP     Hom     123     5               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140535237       A       G       SNP     Het     6.2     6               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140538991       T       C       SNP     Hom     106     4               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140547384       A       G       SNP     Hom     6.02    4               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140554320       T       C       SNP     Het     3.01    5               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140560023       T       C       SNP     Hom     32      3               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140561044       G       A       SNP     Het     4.13    6               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140561121       A       C       SNP     Hom     65.3    5               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301
7       140562068       G       T       SNP     Het     8.64    4               BRAF.7  BRAF    mRNA    NM_004333
        INTRON                                  2301

exome samtools mutation • 5.0k views

ADD COMMENT • link updated 10.0 years ago by Biostar 20 • written 12.8 years ago by Angel ▴ 220

1

Entering edit mode

Care to post an IGV screenshot of your BAM file for the position you're interested in? It's hard to do a diagnosis without seeing the region of interest.

ADD REPLY • link 12.8 years ago by User 59 13k

1

Entering edit mode

A depth of 200 (-D200) is pretty low for exome capture. As for the data, the output of samtools mpileup for a couple of bases including your base of interest would be an alternative to an IGV screenshot.

ADD REPLY • link 12.8 years ago by Sean Davis 27k

0

Entering edit mode

Care to post an IGV screenshot of your BAM file for the position you're interested in? It's hard to do a diagnosis without seeing the region of interest. Your pipeline doesn't appear to include any

ADD REPLY • link 12.8 years ago by User 59 13k

0

Entering edit mode

@Sean, Does -D200 require a minimum depth of 200? I haven't found a clear explanation of these parameters.

ADD REPLY • link 12.8 years ago by Malachi Griffith 20k

0

Entering edit mode

HI all, thanks for your comments. 1. Delay in IGV snapshot, as I need to first transfer the file to my local computer and taking a huge time.

The output of mpileup is a binary format (S375_NoIndex_L007.raw.bcf), what should I do to upload what you are wanting to see? I checked the vcf file and it dosen't contain the SNP I am interested in.

ADD REPLY • link 12.8 years ago by Angel ▴ 220

0

Entering edit mode

So I converted bam to bed format. I am not able to figure out if the bam file contains this mutation or not. Any help?

Genomic Position for V600E = chr7:140453136-140453136

Mutation Syntax CDS: c.1799T>A

Mutation NT: t>a

240 rows contain the coordinate 140453136 as seen below. How do I find if it contains the mutation V600E?

chr7 140453135 140453235 CRKLB09060:52:D09CKACXX:7:1102:17012:134505/1 60 +

chr7 140453136 140453236 CRKLB09060:52:D09CKACXX:7:1303:20425:8020/1 60 +

chr7 140453136 140453236 CRKLB09060:52:D09CKACXX:7:1305:11651:47745/1 60 -

ADD REPLY • link 12.8 years ago by Angel ▴ 220

0

Entering edit mode

So I found the mutation present in bam file (after I aligned just to chr7 and viewed it in IGV). My sense is if it's present in just aligning to chr7, it is also present in aligning to entire hg19.

So my follow-up question is can I replace the rest of the pipeline with something else? What other options I can try in mpileup or how can I view the output of mpileup to see where the mutation got filtered?

ADD REPLY • link 12.8 years ago by Angel ▴ 220

0

Entering edit mode

OK, good news - so now I am able to see the mutation when I tried "-D 1000" (max depth coverage option in vcftools).

But I don't understand what this means biologically and I am exploring. Any suggestions will be appreciated. I saw the mutation at "coverage" = 513 as outputted by snpeff software.

This is the reason I didn't see it as I was using -D 200 and -D 500.

I am also now getting the results from aligning to entire hg19 and using the same parameters. Please comment on why is there a need to use such a high "D" option and what is the biological significance of it.

ADD REPLY • link 12.8 years ago by Angel ▴ 220

0

Entering edit mode

Hi Daniel Swan, I was trying to upload an IGV snapshot but was confused how to do it on biostar. Sorry could you suggest how to upload attachments?

ADD REPLY • link 12.8 years ago by Angel ▴ 220