Split single columm into multiple colums
2
0
Entering edit mode
7.7 years ago
jezielbqi • 0

Hi everyone, I am working with some SNPs analisys/filtering and came across with a challeging task. After filtering for the alterations of interest, I got a file with the follwoing configuration:

01_126729_C
01_85829_G
02_11867_A
02_171665_C
02_183470_A,T
03_225197_G
03_360822_T
03_364428_T
03_51665_C
03_66720_A

In which 01_, 02_, ... represent chromosomes. I need to split this single column into multiple columns, containing only the SNPs from one chromossome per column. I don have a clue on how to do this! Could anyone please help me with this? That would save me a lot of boring work on excel!

Cheers!

snp next-gen genome gene • 1.3k views
ADD COMMENT
0
Entering edit mode
7.7 years ago

First split on "_" and the sort on the "chromosome" column. Both can be done in many different ways - including Excel, R and command line (google). In Excel, the splitting function is inside the "DAta" tab, and is named as "Text to Columns".

ADD COMMENT
0
Entering edit mode
7.7 years ago
st.ph.n ★ 2.7k

Here's a multi-step process so you can see how it's done:

Pull out the chromosome ids from your file

  cut -f 1 -d '_' infile.txt | sort | uniq > all_chrom_ids.txt

Put this into a bash file (get_ids.sh):

 #!/usr/bin/bash 
 grep -e "$1"_ all_chrom_ids.txt > "$1"_snps.txt

Use your ids to make a file, one for each chromosome, containing the snps for that chromosome:

cat all_chrom_ids.txt | xargs -n 1 bash get_ids.sh

Paste each file by tab:

paste *_snps.txt > snps_by_chrom_col.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6