Question

Input format for SNP data in Arlequin

0

Entering edit mode

8.6 years ago

akuepper • 0

Hi, I have genotyping-by-sequencing (GBS) SNP data for 95 individuals and 8 populations. 4566 SNPs for each individual. I would like to do an AMOVA analysis in Arlequin and have trouble reading in the data into the program. I am running out of ideas where I could have gone wrong in my input file. Below are the first lines of my input file (an *.arp file). If anyone has ideas as to what I could try, help is very much appreciated! Thank you!

[Profile] Title="8 populations of Palmer amaranth" NbSamples=8 GenotypicData=1 # - {0, 1} GameticPhase=0 # - {0, 1} RecessiveData=0 # - {0, 1} DataType=DNA # - {DNA, RFLP, MICROSAT, STANDARD, FREQUENCY} LocusSeparator=TAB # - {TAB, WHITESPACE, NONE} MissingData='N' # A single character specifying missing data [Data] [[Samples]] SampleName="Arizona resistant" SampleSize= 12 SampleData= { AZR10 G C C T A A G G T T G A A C A T A G G R Y G T T T A T T C W A Y N C C C G T A Y T C T G T C A N G W A C A A G C N C T C G G C R A A T G N G G A A T T T A G C G Y C C G R T T A T C C C T C A T Y T C T T A G C A G C T C G A G A M A C A R C G K A W C C C C T G T G C A Y C A A C A R T G

SNP • 7.6k views

ADD COMMENT • link updated 3.0 years ago by strive • 0 • written 8.6 years ago by akuepper • 0

0

Entering edit mode

1) Did you get an error message? Can you post it?

2) Did you try using a very small set of SNPs and of individuals? This could help you finding the problem

3) I am not sure if Arlequin is happy with IUPAC codes, did you check this?

ADD REPLY • link 8.6 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Hi

Have you solved your problem? I have also been using Arlequin recently and have encountered the same problem, but I don't know how to deal with it, do you want to talk about it?

Thanks

Ling

ADD REPLY • link 3.0 years ago by strive • 0

score 0 · Answer 1 · 2016-05-11

Thank you very much for your reply.

1) #[ERROR # 1] : unable to read genotype frequency #[ERROR # 2] : unable to read sample data 2) I will try with a smaller SNP set, am wondering though if the size is the problem. I would hate to downsize on information. 3) In the Arlequin manual it says that "The following notation for ambiguous nucleotides are also recognized: R: A/G (purine) Y: C/T (pyrimidine) M: A/C W: A/T S: C/G K: G/T B: C/G/T D: A/G/T H: A/C/T V: A/C/G N: A/C/G/T" Which is why I thought I could use the data format I am currently using. But I have not found any example for an input format with similar data. I am afraid that any other format might make me lose information.

score 0 · Answer 2 · 2016-05-11

0

Entering edit mode

8.5 years ago

akuepper • 0

The smaller data set does not work either. I am wondering: Some of the groups that I am comparing contain different numbers of individuals. I don't think this should be a problem in statistical analysis but maybe it is in Arlequin?

ADD COMMENT • link 8.5 years ago by akuepper • 0

score 0 · Answer 3 · 2016-05-13

No, different size is not an issue.

I think the error is this that you didn't write the frequency of the genotype: imagine you have 3 SNPs. For each diploid sample you have to enter two lines, each reporting one the alleles of each of your 3 SNPs and, before the first series of SNPs, you have to write how many individuals have this genotype (in the example 1).
Like this

sample_a 1 A T C 
           C C T

If you work using IUPAC (which I never did) you only havbe one line for sample, but still you have to put the frequency, so in your case:

AZR10 1 G C C T A A

Hope this fixes the problem!