Are these false somatic variants? Visual inspection with IGV
1
1
Entering edit mode
7.6 years ago
andreslanzos ▴ 60

Hi all,

I have three somatic variants (mutations) detected in a single tumour sample. They come from Whole Genome Sequencing technologies and to call them normal and tumour samples were compared.

The mutations are supported by the same reads. It's a bit suspicious to me because this could indicate that the reads are from another part of the genome.

I have been searching a LOT for online tutorials about manual inspection of mutations with IGV. But I only found a couple of them that are interesting. However, I could not identify if these three mutations are real or not:

Dropbox link to IGV screenshot of variants.

TinyPic link to IGV screenshot of variants.

a) The quality of the reads is good: MAPQ between 60 and 29.

b) The variants are supported by reads from both strands.

c) There are no germline mutations annotated in dbSNP, 1000 genomes, etc.

d) Blat scores for the supporting reads in other regions of the genome are lower than 300, while the scores for this region are higher than 900.

But as I said before, I think that the fact that they are on the same reads is not a good signal.

Do you think that these could be false mutations? or real mutations from another region of the genome?

Thanks in advance.

variant somatic mutations igv snp • 4.2k views
ADD COMMENT
3
Entering edit mode

Likely false. To confirm that, you need to look at the mapping quality of reads with mismatches.

ADD REPLY
1
Entering edit mode

The quality scores of the reads supporting the mutations are good: MAPQ between 60 and 29. That is why I'm not so sure that they are false mutations.

ADD REPLY
2
Entering edit mode

29 is pretty low. A blat score 300 is high enough to make me worry. What is coordinate of this region? The genome build?

ADD REPLY
1
Entering edit mode

I don't understand why you think they are false. Can you explain? I mean, if they are in the tumor but not normal I would say it is definitely real (of course it could be an imperfect replication, but I don't see that as particularly being less real). If they are in both, I'd say it might be real, and might be a pseudogene or something that is not present in the assembly (in which case they'd still be "real" based on the assumption that the assembly is correct, but not "real" in terms of, well, reality. But also in that case you'd see the same variations in tons of people.). But in either case it definitely does not look like a sequencing artifact.

ADD REPLY
2
Entering edit mode

In somatic mutation calling, multiple SNVs on one read are mostly false positives even if you see no evidence in the paired normal. I am not sure what is the exact cause, but most SNVs like these don't get validated experimentally.

ADD REPLY
1
Entering edit mode

Really? I ever met a case like this showed in the picture, all the three mutations could be only found in same reads , saying they occurred together or not. Yet the mutation frequency is 8%. I don't know whether three of them are all false

IGVscreenshot

ADD REPLY
2
Entering edit mode

I would suggest you upload the picture for example on tinypic or other alternatives offering free image hosting. You can also add the image then directly to your post. Call me paranoid, but I don't click on random dropbox links :-)

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, I added a link to Tinypic with the picture.

ADD REPLY
2
Entering edit mode

Those look like real variations. If they are not in the normal sample, then yes, they look like tumor-specific somatic variations.

ADD REPLY
1
Entering edit mode

In theory the mutation caller should have checked this fact, but I will download the BAM file for the normal sample and check just in case. Thanks for the suggestion.

ADD REPLY
2
Entering edit mode

Is there any prior support for the idea that because they are on the same reads, these reads might implicate a certain cell population? Has this been used/seen before?

ADD REPLY
1
Entering edit mode

Very interesting, the fact that these mutations are always on the same reads can be explained by:

a) The reads with mutations are from a different region of the genome, or the reads without mutations are from a different region of the genome.

b) As you say: reads with mutations could come from a certain cell population (cancer clone) and the reads without mutations could come from a different population of cells. I'm not 100% sure, but I think in this paper (link) they use this idea to calculate the percentage of cells that have each mutation (Cancer Cell Fraction).

ADD REPLY
0
Entering edit mode

But as I said before, I think that the fact that they are on the same reads is not a good signal.

I disagree. You have two chromosomes, either your variants are on the same chromosome and therefore in the same read, or on different chromosomes. It's not that unlikely.

ADD REPLY
5
Entering edit mode
7.6 years ago
d-cameron ★ 2.9k

The mutations are supported by the same reads. It's a bit suspicious to me because this could indicate that the reads are from another part of the genome.

It could also indicate that there are three adjacent mutations, or another sort of mutation. Without additional context, is it difficult to determine whether the reads are true variants. In addition to the properties you have identified, you should consider:

prior somatic mutation: are any of the SNVs in COSMIC?

strand bias: looks fine as there are reads supporting the variant originating from both strands

allele frequency: the variants appear to be have somewhere around 20-40% BAF. Is this consistent with the BAF of upstream and downstream SNV?

sequence context/alignment artifacts: do the reads actually represent three nearby SNV? Have a look at the sequence context of the mutations. Depending on the flanking sequence there may be a more parsimonious explanation. For example, in some sequencing contexts, the 3 SNVs could be explained by 1bp insertion that the aligner prefers to align as 3 SNVs. Are the reads containing the variants soft clipped? Is this aactually a STR repeat expansion/contraction?

normal coverage: is there sufficient coverage in the normal for the variants to be reliably called as somatic? (the SNV caller should calculate this)

kataegis: do these mutations occur in a region of kataegis?

variant quality score: what quality score does the variant caller assign to these variants?

SV: is the tumour highly structurally rearranged? Could these be explained by a single DNA repair event? More generally, any explanation that results in the simultaneous generation of the mutations increases the plausibility of the SNVs as a single event explanation removes your implicit assumption that somatic SNVs should not be clustered so closely.

But as I said before, I think that the fact that they are on the same reads is not a good signal.

It just means they are phased together and all occurred on the same chromatid. It does not necessarily make them less likely - especially if they can be explained by a single event. Events such as chromothripsis result in hundreds of structural rearrangements from the same chromatid.

TLDR: from the limited information available, they look plausible.

ADD COMMENT
0
Entering edit mode

Depending on the flanking sequence there may be a more parsimonious explanation. For example, in some sequencing contexts, the 3 SNVs could be explained by 1bp insertion that the aligner prefers to align as 3 SNVs. Are the reads containing the variants soft clipped?

I think that's pretty unlikely here as they occur in lots of reads with various locations. Typically, an insertion misaligned as a series of substitutions will spawn tons of substitutions upstream or downstream of the event, which does not occur here in any read. As you say, it is context-dependent, but you'd basically have to have a homopolymer on one side or the other to prevent that.

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6