How to retrieve rows from OTU table
2
0
Entering edit mode
7.1 years ago
mollysil ▴ 40

I have a text file that is a list of OTU names in the first column, with the occurrence in each treatment in the following columns (totaling 34 columns). I put a sample of the table below. There are ~3000 OTUs total in this file (therefore, ~3000 rows).

CM2_9   0   0   0

AF141_14    22  25  23

AF171_13    13  0   0

LIPB162_1   0   0   0

I have a separate text file with all the OTU names of interest (~500 OTUs), which looks something like this:

WSF3_2

WSF1_2

AF172_15

IO2_57

Is there a simple way to retrieve just the rows in my table that match up to the OTUs of interest? I want, as output, a new table with just the rows of my OTUs of interest. Help please! I'm working in PUTTY (linux). Also, does anything need to be changed to a comma delimited file? Both files are tab delimited as a .txt file.

OTU Rows • 2.3k views
ADD COMMENT
1
Entering edit mode

Take a look at join command in unix if you do not want to use external programs.

ADD REPLY
0
Entering edit mode

Could you try the following solution:

$ grep -f test2.txt test1.txt

test2.txt contains all the OTU names of interest (~500 OTUs) and test1.txt is complete OTU file (~3000 OTUs)

Input:

$ cat test1.txt 
CM2_9   0   0   0
AF141_14    22  25  23
AF171_13    13  0   0
LIPB162_1   0   0   0


$ cat test2.txt 
LIPB162_1
CM2_9

output:

$ grep -f test2.txt test1.txt 
CM2_9   0   0   0
LIPB162_1   0   0   0
ADD REPLY
2
Entering edit mode
7.1 years ago
5heikki 11k

Assuming tab separated files

join -1 1 -2 1 -t $'\t' <(sort -t $'\t' -k1,1 otutable) <(sort -t $'\t' -k1,1 listfile)
ADD COMMENT
1
Entering edit mode
7.1 years ago
st.ph.n ★ 2.7k

Here's a quick python solution, where ids.txt are the OTUs of interest, and otus.txt is your original file.

#!/usr/bin/env python

with open('ids.txt', 'r') as f:
    ids = [line.strip() for line in f]

with open('otus.txt', 'r') as f2:
    otu = {}
    for line in f2:
        otu[line.strip().split('\t')[0]] = line.strip().split('\t')

for i in ids:
    print '\t'.join(otu[i])

Save as get_otus.py, run as python get_otus.py > my_otus.txt

ADD COMMENT
0
Entering edit mode

Magical! Thanks so much!!!

ADD REPLY

Login before adding your answer.

Traffic: 2586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6