Input format for SNP data in Arlequin
3
0
Entering edit mode
8.6 years ago
akuepper • 0

Hi, I have genotyping-by-sequencing (GBS) SNP data for 95 individuals and 8 populations. 4566 SNPs for each individual. I would like to do an AMOVA analysis in Arlequin and have trouble reading in the data into the program. I am running out of ideas where I could have gone wrong in my input file. Below are the first lines of my input file (an *.arp file). If anyone has ideas as to what I could try, help is very much appreciated! Thank you!

[Profile] Title="8 populations of Palmer amaranth" NbSamples=8 GenotypicData=1 # - {0, 1} GameticPhase=0 # - {0, 1} RecessiveData=0 # - {0, 1} DataType=DNA # - {DNA, RFLP, MICROSAT, STANDARD, FREQUENCY} LocusSeparator=TAB # - {TAB, WHITESPACE, NONE} MissingData='N' # A single character specifying missing data [Data] [[Samples]] SampleName="Arizona resistant" SampleSize= 12 SampleData= { AZR10 G C C T A A G G T T G A A C A T A G G R Y G T T T A T T C W A Y N C C C G T A Y T C T G T C A N G W A C A A G C N C T C G G C R A A T G N G G A A T T T A G C G Y C C G R T T A T C C C T C A T Y T C T T A G C A G C T C G A G A M A C A R C G K A W C C C C T G T G C A Y C A A C A R T G

SNP • 7.6k views
ADD COMMENT
0
Entering edit mode

1) Did you get an error message? Can you post it?

2) Did you try using a very small set of SNPs and of individuals? This could help you finding the problem

3) I am not sure if Arlequin is happy with IUPAC codes, did you check this?

ADD REPLY
0
Entering edit mode

Hi

Have you solved your problem? I have also been using Arlequin recently and have encountered the same problem, but I don't know how to deal with it, do you want to talk about it?

Thanks

Ling

ADD REPLY
0
Entering edit mode
8.5 years ago
akuepper • 0

Thank you very much for your reply.

1) #[ERROR # 1] : unable to read genotype frequency #[ERROR # 2] : unable to read sample data 2) I will try with a smaller SNP set, am wondering though if the size is the problem. I would hate to downsize on information. 3) In the Arlequin manual it says that "The following notation for ambiguous nucleotides are also recognized: R: A/G (purine) Y: C/T (pyrimidine) M: A/C W: A/T S: C/G K: G/T B: C/G/T D: A/G/T H: A/C/T V: A/C/G N: A/C/G/T" Which is why I thought I could use the data format I am currently using. But I have not found any example for an input format with similar data. I am afraid that any other format might make me lose information.

ADD COMMENT
0
Entering edit mode
8.5 years ago
akuepper • 0

The smaller data set does not work either. I am wondering: Some of the groups that I am comparing contain different numbers of individuals. I don't think this should be a problem in statistical analysis but maybe it is in Arlequin?

ADD COMMENT
0
Entering edit mode
8.5 years ago
Fabio Marroni ★ 3.0k

No, different size is not an issue.

I think the error is this that you didn't write the frequency of the genotype: imagine you have 3 SNPs. For each diploid sample you have to enter two lines, each reporting one the alleles of each of your 3 SNPs and, before the first series of SNPs, you have to write how many individuals have this genotype (in the example 1).
Like this

sample_a 1 A T C 
           C C T

If you work using IUPAC (which I never did) you only havbe one line for sample, but still you have to put the frequency, so in your case:

AZR10 1 G C C T A A

Hope this fixes the problem!

ADD COMMENT

Login before adding your answer.

Traffic: 2715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6