Desegregation of animals and locations
1
0
Entering edit mode
6.2 years ago
hosin • 0

Hi . I have a .txt file (660 rows) like this :

Locations                                   Animals
chr1:10934871-10991498          MAZ4 DLS5 AFS33
chr1:113630180-113698152    AL108 BDP358 MFW157 MFW160 MAZ4
chr1:131662885-131770719    MAZ4 
chr1:133496547-133547227    FINN306  MAZ4
chr1:134599444-134663260    MAZ4 DLS5 AFS33 AL108 BDP358 MFW157 MFW160 FINN306 
chr1:135686897-135790910    FINN306 
chr1:145754786-147013267    MFW157 MFW160
chr1:147927373-148035506    MAZ4

How can I change it, like this:

MAZ4 
chr1:131662885-131770719
chr1:134599444-134663260
chr1:147927373-148035506
chr1:113630180-113698152
chr1:10934871-10991498

DLS5
chr1:10934871-10991498
chr1:134599444-134663260

AFS33
chr1:134599444-134663260
chr1:10934871-10991498

AL108
chr1:113630180-113698152
chr1:134599444-134663260

BDP358
chr1:134599444-134663260
chr1:113630180-113698152

 MFW157
chr1:113630180-113698152
chr1:134599444-134663260
chr1:145754786-147013267

MFW160 
chr1:134599444-134663260
chr1:113630180-113698152
chr1:145754786-147013267

FINN306
chr1:133496547-133547227
chr1:135686897-135790910
chr1:134599444-134663260

Actually I want to classify animals based on their locations. Please help me, Thanks.

genome • 1.3k views
ADD COMMENT
1
Entering edit mode

We highly appreciate that you try something first before we help.

One solution, take any text manipulation language you want, like Perl or Python

  • Create a dictionnary (key, value)

  • For every line of your .txt file, save new animal as key.

  • For each new key create an array as value and append the associated location to this array

At the end you will have animals as keys in your dictionnary and the associated locations as values

Print the dictionnary as you want. Good luck

Note : Maybe possible in one command line in awk for tryharders

ADD REPLY
0
Entering edit mode

Thanks for your attention

ADD REPLY
0
Entering edit mode

What are the delimiters between the columns (I assume tab) and between the animals (I assume whitespace)?

ADD REPLY
0
Entering edit mode

Yes, of course. You are right.

ADD REPLY
1
Entering edit mode
6.2 years ago
ATpoint 85k

Assuming "\t" between columns and whitespace between animals, do:

while read p; do grep $p your_list.txt | cut -f1 | cat <(echo $p) <(cat /dev/stdin); done < <(cut -f2 your_list.txt | awk 'NR > 1 {gsub(" ", "\n"); print $0}' | awk NF | sort -k1,1 -u)

It takes the unique IDs of the animals, cut -f2 your_list.txt | awk 'NR > 1 {gsub(" ", "\n"); print $0}' | awk NF | sort -k1,1 -u The NR>1 avoids printing the header line, the awk NF removes empty lines and the sort -u keeps only unique IDs.

then grep for the lines that contain that animal ID and isolates the column with the coordinates, grep $p your_list.txt | cut -f1

and then simply cats the animal ID (which is $p within the while loop) together with the grepped coordinates. That all is pressed into the above ugly piece of code.

The <(...) parts are a convinient way to pass the results of a command into a new command, similar to piping from/to STDIN/STDOUT.

I strongly recommend to spend some time to learn how to solve these things yourself. You'll need these simple bash scripting things for filtering and subsetting all the time.

ADD COMMENT
0
Entering edit mode

Thanks for your attention.

ADD REPLY

Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6