Question

Pormat ped and map file for PLINK with bacteria

1

Entering edit mode

6.7 years ago

Seb_Lopez ▴ 10

Hi,

I am sort of new in the field. I want to know two things

If someone has worked with PLINK for association studies in bacteria. The chase is that I have a gene presence/absence table and want to assess if one of those genes is significantly related to a particular phenotype. Is this possible with PLINK? I actually saw someone do it and I would like to understand the rationale behind the formatting of the .ped and .map files as well as the analysis.

As far as I remember, the affected (case) and unaffected (control) groups are my bacterial phenotypes, but there's more than that. I think there are some columns to add to those files.

If someone has more experience, please let me know.

Not sure if this is the appropriate place to ask this. If not, my apologies.

PLINK plink • 1.6k views

ADD COMMENT • link updated 6.7 years ago by zx8754 12k • written 6.7 years ago by Seb_Lopez ▴ 10

score 2 · Answer 1 · 2018-05-08

2

Entering edit mode

6.7 years ago

Kevin Blighe 88k

Bonjour / Bonsoir, in which format is your data, currently? While I have not heard of anyone using plink for bacteria, I do not doubt the utility of plink in such a situation. Plink's basic association test is just a χ2 (chi-square) approximation and looks at allele tallies in cases and controls (or whatever phenotype(s) you're measuring).. Other tests, like family-related tests would obviously not be suitable. If you look at my recent answer, you will see how you easily just conduct the test yourself: A: SNP dataset and Z Score

Otherwise, here is information on the formatting:

PED
MAP

If you create data.ped and data.map, you can then load these into plink with:

plink --file mydata

[source: http://zzz.bwh.harvard.edu/plink/data.shtml#plink]

Kevin

ADD COMMENT • link 6.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks for your reply Kevin. My data is a table containing groups of bacteria in the rows and in the columns there are genes or gene families. When I mentioned phenotypes in the original question, I actually meant "taxa". So my idea is that I can use PLNK to show that certain genes are uniquely present in certain closely related groups of bacteria (say subspecies or strains) or that they are "associated" with a particular taxon. For example: species 1 has gene X that is not present in species 2 , 3, 4 and 5. I am guessing ploidy is a limitation that could be addressed by formatting the data table in a way that it resembles a diploid organism. Here is an example of the table I have.

https://drive.google.com/open?id=1Hzj26cT3rHHT5zTvegkTHN6dVmB7naVu

Converting to binary is a must, as far as I remember. After that I'm quite lost.

Thanks for your help

ADD REPLY • link 6.7 years ago by Seb_Lopez ▴ 10

0

Entering edit mode

I see. I am beginning to think that you should do this entirely outside of Plink, like, using some of the tests that I mentioned in my other thread. With those, you can see if a gene is more frequent in a particular bacteria or taxa. What do you think?

Another thing that you could do with your data is to define a gene signature that could be used as a sort of 'identifier' of the taxa that you are aiming to distinguish. For example, you could ultimately say that Gene1+Gene4+Gene7+Gene8 can statistically distinguish Taxa1 from Taxa2 (AUC, 0.95; cross-validated r^2, 0.6). If you want to learn more about that, you can take a look here: C: Resources for gene signature creation

Not sure if that helps.

ADD REPLY • link 6.7 years ago by Kevin Blighe 88k

0

Entering edit mode

I will take a look at that. Maybe you are right and PLNK is not the most straightforward answer for this question. I'll update on progress if necessary. Thanks again.