Question

Convert coverage values to number of occurrence

0

Entering edit mode

4.7 years ago

genomes_and_MGEs ▴ 10

Hi everyone,

I have a summary of reports like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   .   100.00;99.86;99.86;100.00   .
file2.tab   2   .   .   .
file3.tab   18  .   .   98.29;98.29;98.29

The values represent the coverage, and for example for file1.tab, the abaR gene is present in 4 copies. However, I'm interested not in the coverage, but in converting to number of occurrence of each gene, so that it will show like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   0   4   0
file2.tab   2   0   0   0
file3.tab   18  0   0   3

It's easy to change the . by 0, but I'm not sure how to replace the different coverage values by the number of occurrence.

Thanks a lot!

sequence gene • 660 views

ADD COMMENT • link updated 4.7 years ago by Pierre Lindenbaum 166k • written 4.7 years ago by genomes_and_MGEs ▴ 10

score 1 · Answer 1 · 2020-11-26

1

Entering edit mode

4.7 years ago

Pierre Lindenbaum 166k

awk '/^#/ {print;next} {printf("%s\t%s",$1,$2);for(i=3;i<=NF;i++) {printf("\t%d",$i=="."?0:split($i,a,/[;]/));} printf("\n");}'  input.txt

ADD COMMENT • link 4.7 years ago by Pierre Lindenbaum 166k