Convert rows to columns delimited by each SNP ID
2
0
Entering edit mode
2.4 years ago
selplat21 ▴ 20

I have a file I'm working with where each row begins with a SNP ID and is followed by 510 mean genotypes.

These are space delimited like so:

Chr1_25 0 0 0 0.1 
Chr1_33 0.2 0 0 1
Chr5_44 1.2 2 0 2 
Chr7_87 0 0 0 0 

I want to convert each row into tab delimited columns like so:

Chr1_25             Chr1_33       ....
0                   0.2
0                   0
0                   0
0.1                 1
vcf linux bimbam • 1.2k views
ADD COMMENT
1
Entering edit mode
2.4 years ago
selplat21 ▴ 20

I was able to solve this with the following:

conda install bioconda -c csvtk 
grep -w -f candidate.snps Test.bimbam | csvtk transpose | sed -e '2,3d' >> candidate.list

candidate.snps is a list of snp_ids (rs......). This provides a transposed matrix with snpID as the header and dosages for all individuals as the rows.

ADD COMMENT
0
Entering edit mode
2.4 years ago
Trivas ★ 1.8k

You're looking to transpose your file. This is straight forward to do in R using t() and likely easy in Python but if you want to do it by command line it's probably a little more convuluted. Here's a stack overflow link that goes through how to do it using awk https://stackoverflow.com/questions/1729824/an-efficient-way-to-transpose-a-file-in-bash

ADD COMMENT
0
Entering edit mode

So I was able to locate posts like this, but I consistently get the output as a single vector:

Chr1_25
0                   
0                   
0                
0.1       
Chr1_33
0.2
0
0
1
ADD REPLY
0
Entering edit mode
while read i; do grep -w $i Test.bimbam | cut -d, --output-delimiter $'\t' -f1,4- \
| awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' ; done < candidate.snps 
ADD REPLY
0
Entering edit mode

I have tried transposing with R but am getting the same solution:

I made an Rscript transpose.R

#!/usr/bin/env Rscript
input <- file("stdin", "r")
x <- readLines(input)
x <- as.data.frame(x)
x1 <- t(x)
write(x1, "")

I then do the following to my bimbam file:

while read i; do grep -w $i Test.bimbam | cut -d, --output-delimiter $'\t' -f1,4- | transpose.R ; done < candidate.snps 

I am getting the same solution:

Chr1_25
0
0
0
0.1
Chr1_33
...
...
...

The original bimbam is in this format

Chr1_25,A,T,0,0,0,0.1
Chr1_33,C,G,......
ADD REPLY

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6