Question

Convert rows to columns delimited by each SNP ID

0

Entering edit mode

2.6 years ago

selplat21 ▴ 20

I have a file I'm working with where each row begins with a SNP ID and is followed by 510 mean genotypes.

These are space delimited like so:

Chr1_25 0 0 0 0.1 
Chr1_33 0.2 0 0 1
Chr5_44 1.2 2 0 2 
Chr7_87 0 0 0 0

I want to convert each row into tab delimited columns like so:

Chr1_25             Chr1_33       ....
0                   0.2
0                   0
0                   0
0.1                 1

vcf linux bimbam • 1.3k views

ADD COMMENT • link updated 2.6 years ago by GenoMax 149k • written 2.6 years ago by selplat21 ▴ 20

0

Entering edit mode

2.6 years ago

Trivas ★ 1.9k

You're looking to transpose your file. This is straight forward to do in R using t() and likely easy in Python but if you want to do it by command line it's probably a little more convuluted. Here's a stack overflow link that goes through how to do it using awk https://stackoverflow.com/questions/1729824/an-efficient-way-to-transpose-a-file-in-bash

ADD COMMENT • link 2.6 years ago by Trivas ★ 1.9k

0

Entering edit mode

So I was able to locate posts like this, but I consistently get the output as a single vector:

ADD REPLY • link 2.6 years ago by selplat21 ▴ 20

0

Entering edit mode

while read i; do grep -w $i Test.bimbam | cut -d, --output-delimiter $'\t' -f1,4- \
| awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' ; done < candidate.snps

ADD REPLY • link 2.6 years ago by selplat21 ▴ 20

0

Entering edit mode

I have tried transposing with R but am getting the same solution:

I made an Rscript transpose.R

#!/usr/bin/env Rscript
input <- file("stdin", "r")
x <- readLines(input)
x <- as.data.frame(x)
x1 <- t(x)
write(x1, "")

I then do the following to my bimbam file:

while read i; do grep -w $i Test.bimbam | cut -d, --output-delimiter $'\t' -f1,4- | transpose.R ; done < candidate.snps

I am getting the same solution:

Chr1_25
0
0
0
0.1
Chr1_33
...
...
...

The original bimbam is in this format

Chr1_25,A,T,0,0,0,0.1
Chr1_33,C,G,......

ADD REPLY • link 2.6 years ago by selplat21 ▴ 20

GenoMax · Accepted Answer · 2022-07-21

1

Entering edit mode

2.6 years ago

selplat21 ▴ 20

I was able to solve this with the following:

conda install bioconda -c csvtk 
grep -w -f candidate.snps Test.bimbam | csvtk transpose | sed -e '2,3d' >> candidate.list

candidate.snps is a list of snp_ids (rs......). This provides a transposed matrix with snpID as the header and dosages for all individuals as the rows.

ADD COMMENT • link updated 2.6 years ago by GenoMax 149k • written 2.6 years ago by selplat21 ▴ 20