how to write a script to grep the lowest e-value for each gene?
2
0
Entering edit mode
7.3 years ago
Anny ▴ 30

Hi all,

I have a file containing blast hits for each gene, one hit per line, as following

ID  E-value

g1 0.0

g1 1e-20

g1 1e-5

g2 1e-5

g2 1e-13

g3 0.01

I would like to grep the lowest e-value for each gene, how to write a script to make it?

Thank you!

Alexie

linux script • 2.0k views
ADD COMMENT
1
Entering edit mode

Why not just do the blast over with -max_target_seqs 1?

ADD REPLY
0
Entering edit mode

Actually, they are a merged file of results generated by different methods, I want to check which method generally give the best results or more reliable.

ADD REPLY
0
Entering edit mode

my data: blast_scores.txt

ID  E-value
g1  0.0
g1  1e-20
g1  1e-5
g2  1e-5
g2  1e-13
g3  0.01

code:

$ datamash -H -g 1 min 2 < blast_scores.txt

output:

$ datamash -H -g 1 min 2 < blast_scores.txt 
GroupBy(ID) min(E-value)
g1  0
g2  1e-13
g3  0.01

datamash is command line tool from GNU

ADD REPLY
1
Entering edit mode
7.3 years ago
st.ph.n ★ 2.7k

Assuming input file is tab-delimited. If not, reformat, or change the code below. Save code as sort_eval.py and run as python sort_eval.py file.txt

#!/usr/bin/env python
import sys
from collections import defaultdict

evalue = defaultdict(list)

with open(sys.argv[1], 'r') as f:
        next(f)
        for line in f:
                evalue[line.strip().split('\t')[0]].append(line.strip().split('\t')[1])

for i in sorted(evalue):
        print i, '\t', sorted(evalue[i])[0]

Output:

g1      0.0
g2      1e-13
g3      0.01
ADD COMMENT
1
Entering edit mode
7.3 years ago
venu 7.1k

Something like this

show contents | remove header | sort based on first column followed by second | keep rows based on first occurrence of first column

code

cat test.test | grep -v ID | sort -k1,1 -k2,2n | awk '!a[$1]++'

result

g1 0.0
g2 1e-13
g3 0.01

P.S. You can directly pass input file to grep i.e. grep -v ID text.txt | ...

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6