Question

DiscoSNP-RAD to Structure

0

Entering edit mode

7.4 years ago

Gio12 ▴ 220

Hello all,

I would appreciate some advice. I have generated SNP sequences using DiscsoSNP-RAD and figured out a pipeline to import the data into STRUCTURE to look at population structures. I am trying to perform a similar experiment as Gauthier et al. did in their paper DiscoSnp-RAD: de novo detection of small variants for population genomics. The fasta file produces SNP sequences of upper and lower paths as demonstrated below:

SNP_higher_path_3|P_1:30_C/G|high|nb_pol_1|left_unitig_length_86|right_unitig_length_261| left_contig_length_168|right_contig_length_764|C1_124|C2_0|G1_0/0|G2_1/1|rank_1.00000

SNP_lower_path_3|P_1:30_C/G|high|nb_pol_1|left_unitig_length_86|right_unitig_length_261| left_contig_length_168|right_contig_length_764|C1_0|C2_134|G1_0/0|G2_1/1|rank_1.00000

My question is, should I use only one of the paths or both of the paths for STRUCTURE analysis? If one, which path would be best? I appreciate any feedback!

Thank you in advance

SNP DiscoSNP RAD-Seq Structure • 1.6k views

ADD COMMENT • link updated 7.4 years ago by Jeremy Gauthier ▴ 20 • written 7.4 years ago by Gio12 ▴ 220

score 2 · Answer 1 · 2018-03-16

Hi,

The fasta file represents an intermediate DiscoSNP-RAD output. The two paths represent the two alleles of the bubble allowing the identification of the SNP.

To obtain an exploitable output, you need to run the provided script named: discoRAD_finalization.sh as follows: sh discoRAD_finalization.sh –f the_fasta_you_discribed.fa –r path_to_rconnector

This script will filter reliable SNP, clusters them when they belong to the same locus and finally produces a VCF file, this format is a classical population genetics format which can be easily converted to STRUCTURE format.

Moreover, STRUCTURE approach requires unlinked markers thus I recommend you keep randomly only one SNP by cluster from the VCF file. Best regards. Jeremy Gauthier