Numeric genotype to letters
3
0
Entering edit mode
5.5 years ago

I have an input file like this:

#CHROM  START   ID  Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   0   0   0   1   0
3   11641   snp_3_11641 T   G   0   1   1   2   2
3   14240   snp_3_14240 G   A,T 0   0   0   0   1

From column 5 onwards if its 0 replace it with ref+ref, if its one replaces it with ref+alt and if its 2 replace it with alt+alt so that the above table should be like that:

#CHROM  START   ID  Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   CC  CC  CC  CT  CC
3   11641   snp_3_11641 T   G   TT  TG  TG  GG  GG
3   14240   snp_3_14240 G   A   GG  GG  GG  GG  GA
SNP • 1.4k views
ADD COMMENT
3
Entering edit mode
5.5 years ago
 awk '/^#/{print;next;} {split($5,a,/[,]/);for(i=1;i<6;i++) printf("%s%s",i==1?"":"\t",$i);for(i=6;i<=NF;i++) { printf("\t"); if($i==0) printf("%s%s",$4,$4); else if($i==1) printf("%s%s",$4,a[1]); else if($i==2) printf("%s%s",a[1],a[1]); else printf("??");} printf("\n");} ' input.tsv

#CHROM START ID Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   CC  CC  CC  CT  CC
3   11641   snp_3_11641 T   G   TT  TG  TG  GG  GG
3   14240   snp_3_14240 G   A,T GG  GG  GG  GG  GA
ADD COMMENT
2
Entering edit mode
5.5 years ago

I have no idea what the format of your object is but here is a potential solution in R.

Assuming you have your input file in R and it looks as follows (you can use read.delim()):

  X.CHROM START          ID Ref Alt X108 X139 X159 X265 X350
1       5  5571  snp_5_5571   C   T    0    0    0    1    0
2       3 11641 snp_3_11641   T   G    0    1    1    2    2
3       3 14240 snp_3_14240   G A,T    0    0    0    0    1

Then you can use the following function on this object (class data.frame) in the following way:

convertSNP <- function(var.row){
  ref <- var.row['Ref']
  alt <- var.row['Alt']
  ## keep only first variant
  alt <- gsub(",[AGTC]+","",alt)
  var.row[var.row == 0] <- paste0(ref,ref)
  var.row[var.row == 1] <- paste0(ref,alt)
  var.row[var.row == 2] <- paste0(alt,alt)
  ## get rid of whitespace
  var.row <- gsub(" ","",var.row)
  return(var.row)
}

## run function on data.frame by row and transpose the result
t(apply(df,1,convertSNP))

Giving the result:

     X.CHROM START   ID            Ref Alt   X108 X139 X159 X265 X350
[1,] "5"     "5571"  "snp_5_5571"  "C" "T"   "CC" "CC" "CC" "CT" "CC"
[2,] "3"     "11641" "snp_3_11641" "T" "G"   "TT" "TG" "TG" "GG" "GG"
[3,] "3"     "14240" "snp_3_14240" "G" "A,T" "GG" "GG" "GG" "GG" "GA"
ADD COMMENT
1
Entering edit mode
5.5 years ago
JC 13k
$ perl -lae 'for ($i=5;$i<=$#F;$i++) { $F[$i]=$F[3]x2 if ($F[$i]==0); $F[$i]=$F[3].$F[4] if ($F[$i]==1); $F[$i]=$F[4]x2 if ($F[$i]==2); $F[$i]=~ s/,.+//; } print join "\t", @F' < in
#CHROM  START   ID      Ref     Alt     108     139     159     265     350
5       5571    snp_5_5571      C       T       CC      CC      CC      CT      CC
3       11641   snp_3_11641     T       G       TT      TG      TG      GG      GG
3       14240   snp_3_14240     G       A,T     GG      GG      GG      GG      GA
ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6