SNP/Mbp rate weird, any known bias/artifact?
0
0
Entering edit mode
6.6 years ago

Hello guys, I am working with rates of variation (expressed in SNP/Mbp) and I'm observing something which biologically doesn't have much sense.

I have two variant call sets: one contains calls on the whole genome, the other contains an subset of it, made on the coding regions only (CDS lines in GFF format, so to say).

I calculate the density by taking the number of SNPs identified and dividing it by the positions covered by the reads used to call such SNPs: e.g. if I have 100 SNPs identified within coding regions, and I have coverage on 1 Mbp of the coding region positions, I will have 100 SNPs / 1 Mbp.

Biological expectation: higher density of variants when looking at the whole genome, than when looking at the CDS only. Results: the opposite (even though not a big difference).

I thought that this might be due to the fact that I work with a non-model organism for which the genome is not of high quality. So I repeated the pipeline with A. thaliana and there it showed a higher density of variants on the whole genome and a lower one in the CDS (as expected).

Do you know what could have generated this weird result? Is the genome assembly quality your best guess as culprit? I am running out of options to test ^^

SNP Mbp Variant Calling Bias • 1.3k views
ADD COMMENT
0
Entering edit mode

Actually, my best guess would be a problem in the annotation (GFF). If you sample CDS and non-CDS and see no big difference, maybe the annotation between the two is not right.

ADD REPLY
0
Entering edit mode

Actually, it is not exactly as you said:

one contains calls on the whole genome, the other contains an subset of it

I don't have a non-CDS group and a CDS group, I have a whole-genome (including CDS) and a CDS only.

Also, I am quite confident on the annotation because it was done by me, so I know what happenened at each stage and I'm quite sure this is not the case. Or so I hope!

ADD REPLY

Login before adding your answer.

Traffic: 1862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6