Discosnp
3
0
Entering edit mode
2.7 years ago
Biostar • 0

Hello,

I used DiscoSnp (specifically discoSnpRAD) with a reference genome and I have two questions:

  1. Out of 700k SNPs, only 30% are in genomic regions (there is a scaffold number in the .vcf file). The rest 70% SNPs have chromosome number given as SNP_higher_path and SNP_lower_path. Does this mean that the variants of these SNPs did not map to the genome but were called de novo? Can I still retain them?

  2. Does SNP_higher_path and SNP_lower_path represent two alleles from the same variant? If so, I need to remove one of them correct?

Thanks.

genome reference SNP_lower_path SNP_higher_path Discosnp • 1.0k views
ADD COMMENT
0
Entering edit mode
2.7 years ago

Hello

Does this mean that the variants of these SNPs did not map to the genome but were called de novo?

Not exactly. All variants are called de novo. Once detected, their sequences may be mapped on a reference genome to detect their locus.

In your case, 30% of the called variants were mapped on the genome, explaining why they are associated with a locus in the VCF (to a scaffold and a position in your case).

The remaining 70% were not mapped in the reference.

Can I still retain them?

This depends on what you're looking for. You may also filter variants based on their minimal MAF, or other criteria (see the COOKBOOK for some examples).

Does SNP_higher_path and SNP_lower_path represent two alleles from the same variant?

Yes

If so, I need to remove one of them correct?

You may indeed conserve only one of them, depending on your downstream usage.

I hope this helps.

Best Pierre

ADD COMMENT
0
Entering edit mode
2.7 years ago
Biostar • 0

Hi Pierre,

Thanks for the clarifications. I've been trying out some of the filtering options that you mentioned. But does it matter which allele I retain among the two options: higher or lower path? Is there any parameter to consider to select one or other?

Also, I see that mapped SNPs (with scaffold number) don't have higher and lower path flags. Does this mean one allele was already removed when the results were output?

Thanks.

ADD COMMENT
0
Entering edit mode
2.7 years ago

Hi

HIgher or lower are meaningless. It's a (deterministic) random choice.

Once mapped, one knows which of the nucleotide is the reference, hence it becomes REF, when the other is ALT. A particular case, however: when the predicted variant is, say, A/T and the mapped reference genome contains, say, G at the mapped position, the VCF still contains REF and ALT alleles (while meaningless), but the 'Genome' field contains the nucletide 'G'.

Best, Pierre

ADD COMMENT

Login before adding your answer.

Traffic: 2234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6