Entering edit mode
5.9 years ago
qinglong
▴
10
Hi there,
I am seeking help from the community for parsing blast2lca output from MEGAN version 6:
blast2lca output (semi-colon separated; I have millions of genes in a txt file; 1001, 10010, 10011 etc. are gene ID):
1001; ;g__Bacteroides; 100;s__Bacteroides caccae; 21;
10010; ;g__Clostridium; 100;s__Clostridium butyricum; 50;
10011; ;g__Clostridium; 100;s__Clostridium butyricum; 75;
...
...
Here is what I want (set cut-off to 50; stored in a tab-delimited file):
GeneID Genus Species
1001 Bacteroides
10010 Clostridium Clostridium butyricum
10011 Clostridium Clostridium butyricum
...
...
Could you please provide me a command or simple script to do this ?
Much appreciated!
Qinglong
if input and output are exactly same as OP, you can have this way:
test.txt is input and output is:
input:
Thanks!!! But I also need to have a filtering based on the confidence score (cut-off: 50), do you have any other command to do that?