Basics about sequencing/alignment and variant calling
1
0
Entering edit mode
6.8 years ago

I am new on bioinformatics, and I am not biologist. I have a basic questions about sequencing/alignment and variant calling. For example in humans, I understand that the DNA mutates all time, so during the process of sequencing, for one specific locus, in theory we only expect two variations (diploid) but they can be more because sequencing errors, indels, dels etc (these, are discarded during the variant calling process).

It can occurs that the human dna contain more than 2 alleles because some cells contain a mutation/variation and others not? In an haploid case, can we expect alleles?

When we obtain the 'consensus' from an alignment of all the reads from a single individual. It takes the most common variations for each position? So for example in a diploid organism, with one single consensus we loose a lot of information (all the alleles,in case they are heterozygotic)

Thanks!

sequencing alignment variant calling • 1.8k views
ADD COMMENT
0
Entering edit mode

It can occurs that the human dna contain more than 2 alleles because some cells contain a mutation/variation and others not? In an haploid case, can we expect alleles?

Yes but most of case it is mosaic cells (sub-population of cells) which have low coverage.

When we obtain the 'consensus' from an alignment of all the reads from a single individual. It takes the most common variations for each position?

I most of case the information is contained in SNP database about the organism you are studying.

ADD REPLY
0
Entering edit mode
6.8 years ago
d-cameron ★ 2.9k

When we obtain the 'consensus' from an alignment of all the reads from a single individual. It takes the most common variations for each position? So for example in a diploid organism, with one single consensus we loose a lot of information (all the alleles,in case they are heterozygotic)

Variant callers do not attempt to take a single consensus sequence, they attempt to report all alleles that are present. In the case of germline diploid samples, a variant caller will report up to two non-reference alleles at a given position (e.g. reference genome has G at that position and the sample is heterozygous C/G).

Reducing to a single consensus of a diploid organism is generally only ever done when creating a reference genome for that organism.

indels, dels etc (these, are discarded during the variant calling process).

Insertions and deletions are valid alleles and (good) variant callers do not discard them.

ADD COMMENT
0
Entering edit mode

Thanks for the clarification, d-cameron. So when you said:

is generally only ever done when creating a reference genome for that organism.

you mean when creating a reference assembly like:

Reference Assembly - Mapping Reads To A Reference Genome

So in that case, it just takes the most common variations for each position to create a single consensus and for it they first do a variant calling.

ADD REPLY

Login before adding your answer.

Traffic: 2161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6