Question

Allele count from factor variable in R

0

Entering edit mode

10.2 years ago

pifferdavide ▴ 110

I am writing code to count alleles from 23andMe genome text files. The code returns a factor with levels corresponding to allele symbols. I want to assign a number to each genotype. I want to code so that each effect allele is scored as 1 and the other allele as 0. In this case AA=2, AG=1, GG=0. Instead, if I use the as.integer function, it simply assigns the number corresponding to the position among the levels(see bottom of output), but that is not what I want.

As the alleles column (V4) has 19 different levels (corresponding to all the alleles present in the genome) I am interested in only 4 of them for each SNP. How do I assign a numeric value to each of the four genotypes?

setwd("~/genomes")

mydata=read.table("genome_003.txt")
View(mydata)

library(Hmisc)
df=as.data.frame(mydata)
rownumber=match('rs9375195', rs)#returns the first location of SNP
df[rownumber,] #displays row corresponding to SNP

V1 V2 V3 V4 224186 rs9375195 6 98562720 AA

genotype=df[rownumber,]$V4
genotype #displays alleles for corresponding SNP [1]

AA #genotype
Levels: -- A AA AC AG AT C CC CG CT DD DI G GG GT I II T TT > number=as.integer(genotype) > number [1] 3

genome SNP R • 5.9k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by pifferdavide ▴ 110

0

Entering edit mode

So what you want is for genotype=df[rownumber,]$V4 to return 2 instead of AA?

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by PoGibas 5.1k

0

Entering edit mode

Exactly so!

ADD REPLY • link 10.2 years ago by pifferdavide ▴ 110

0

Entering edit mode

10.2 years ago

PoGibas 5.1k

# THIS IS NOT TESTED

# Load libraries
library(data.table)

# Args
SNP <- "rs9375195"
file <- "genome_003.txt"

# Read Data
mydata <- fread(file)

# Add dummy column
mydata[, V4Numbers := 777]
# Allele to number
mydata[V4=="GG", V4Numbers := 0]
mydata[V4=="AG", V4Numbers := 1]
mydata[V4=="AA", V4Numbers := 2]

# Get wanted SNP
# Don't know which is SNP column (lets say it's V2)
mydata[V2==SNP, V4Numbers]

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by PoGibas 5.1k

0

Entering edit mode

I get various error messages. E.g.:

mydata[V4=="GG", V4Numbers := 0]
Error in eval(expr, envir, enclos) : object 'V4' not found

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by pifferdavide ▴ 110

0

Entering edit mode

Check class of mydata (class(mydata)), it should be data.table

# Try this
mydata <- as.data.table(mydata)
# Also check is there column named V4

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by PoGibas 5.1k

Ram · Accepted Answer · 2015-05-14

1

Entering edit mode

10.2 years ago

Steven Lakin ★ 1.8k

You can use a vector in R the same way you would use a dictionary in another language:

myVector <- c(1,2,3)
names(myVector) <- c("vector", "of", "names")
myVector["vector"]  # returns the name and its value (key and value in a dictionary)

as.numeric(myVector["vector"])  # returns the value associated with the name
names(myVector)[myVector == 1]  # returns the name associated with the value of 1

You can initialize the vector with its key/value pair also:

myVector <- c("vector" = 1, "of" = 2, "names" = 3)

So what you want to do is build a dictionary to translate your letters into the numbers. Build the dictionary, then pass your values through it.

Check out this thread on stackoverflow for more details: http://stackoverflow.com/a/2865191/4872975

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by Steven Lakin ★ 1.8k

0

Entering edit mode

I am not sure. The process you are describing works for assigning letters to number but I do not know how to do it the other way round (assigning numbers to letters or levels of a factor). In this instance, the values are pre-assigned by R to each level according to the alphabetical order (as by default), so -- is 0, A is 1, AA is 2, AC is 3, etc.

Instead, I want to assign the values to the levels from scratch, ignoring the default R settings.

ADD REPLY • link 10.2 years ago by pifferdavide ▴ 110

1

Entering edit mode

# example data frame
df <- data.frame("Genotype" = c("Genotype1", "Genotype2", "Genotype3"), "Alleles" = c("AA", "AC", "GG"))

   Genotype Alleles
1 Genotype1      AA
2 Genotype2      AC
3 Genotype3      GG

# set the numbers equal to whatever you want for each allele
translate <- c("A" = 1, "AA" = 2, "AC" = 1, "AG" = 1, "AT" = 1, "C" = 0, "CC" = 0, "CG" = 0, "CT" = 0, "DD" = 0, "DI" = 0, "G" = 0, "GG" = 0, "GT" = 0, "I" = 0, "II" = 0, "T" = 0, "TT" = 0)

as.numeric(translate[as.character(df[rownumber, columnnumber])]) # get allele value
as.numeric(translate[as.character(df[rownumber, ]$columnname)])  # same thing but with column name

# Example for the above data frame:
df[1, ]$Alleles

[1] AA
Levels: AA AC GG

as.numeric(translate[as.character(df[1, ]$Alleles)])

[1] 2

One tricky thing with factors is that if you "as.numeric()" the factor, it will give you its level. Force it to character in order to avoid this.

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 10.2 years ago by Steven Lakin ★ 1.8k

0

Entering edit mode

Brilliant! It works!

ADD REPLY • link 10.2 years ago by pifferdavide ▴ 110