Question

SNP/Mbp rate weird, any known bias/artifact?

0

Entering edit mode

6.6 years ago

Matteo Schiavinato ★ 3.6k

Hello guys, I am working with rates of variation (expressed in SNP/Mbp) and I'm observing something which biologically doesn't have much sense.

I have two variant call sets: one contains calls on the whole genome, the other contains an subset of it, made on the coding regions only (CDS lines in GFF format, so to say).

I calculate the density by taking the number of SNPs identified and dividing it by the positions covered by the reads used to call such SNPs: e.g. if I have 100 SNPs identified within coding regions, and I have coverage on 1 Mbp of the coding region positions, I will have 100 SNPs / 1 Mbp.

Biological expectation: higher density of variants when looking at the whole genome, than when looking at the CDS only. Results: the opposite (even though not a big difference).

I thought that this might be due to the fact that I work with a non-model organism for which the genome is not of high quality. So I repeated the pipeline with A. thaliana and there it showed a higher density of variants on the whole genome and a lower one in the CDS (as expected).

Do you know what could have generated this weird result? Is the genome assembly quality your best guess as culprit? I am running out of options to test ^^

SNP Mbp Variant Calling Bias • 1.3k views

ADD COMMENT • link 6.6 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Actually, my best guess would be a problem in the annotation (GFF). If you sample CDS and non-CDS and see no big difference, maybe the annotation between the two is not right.

ADD REPLY • link 6.6 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Actually, it is not exactly as you said:

one contains calls on the whole genome, the other contains an subset of it

I don't have a non-CDS group and a CDS group, I have a whole-genome (including CDS) and a CDS only.

Also, I am quite confident on the annotation because it was done by me, so I know what happenened at each stage and I'm quite sure this is not the case. Or so I hope!

ADD REPLY • link 6.6 years ago by Matteo Schiavinato ★ 3.6k