I am too naive here, sorry first for trivial question. I want to make reference table so that I can annotate the antibiotic resistance gene hits with antibiotic resistance gene category name by using the following commands:
1)making a reference database for annotating the aro group numbers with the antibiotic resistance groups
cat ./aro.obo | tr "\n" "@" | sed 's/@@/\n/g' | grep -v format-version | grep -v Typedef | sed 's/\[Term\]@id\:\s//g' | sed 's/@.*@is_a/\tis_a/' | grep is_a | sed 's/@relationship.*//' | sed 's/is_a.*\!\s//' | sed 's/ /_/g' > ./ARO_numbers_and_AR_groups.tsv
2)Get a list of ARO numbers with their corresponding gene ID numbers and taxonomic associations from fasta 3)The fasta is annotated as a heirarchy so all ARO numbers should be taken
grep '>' AR-polypeptides.fa | sed 's/>//' | sed 's/ARO:1000001//g' |sed 's/\s.*ARO/\tARO/' | sed 's/\ .*\[/\t[/' | sed 's/ /_/g' > ./gene_IDs_and_ARO_numbers_and_AR_groups.tsv
4)Next, merge the files (using awk) into a single reference database
awk 'FNR==NR { a[$1]=$2; next } $2 in a { print a[$2]"\t"$1"\t"$2"\t"$3 }' ./ARO_numbers_and_AR_groups.tsv ./gene_IDs_and_ARO_numbers_and_AR_groups.tsv > ./CARD_annotation_reference.tsv
While I can produce the two outputs from the first and second command, the awk part does not give any output. here are some lines from the first and the second output.
ARO:0000000 antibiotic_molecule
ARO:0000001 antibiotic_molecule@synonym:_"quinolone"_EXACT_[]
ARO:0000002 tetracycline_resistance_gene
ARO:0000003 aminoglycoside@synonym:_"Astromicina"_EXACT_[]@synonym:_"Astromicine
gi|AAA76822.1|ARO:3002654|APH(3')-VIIa [Campylobacter_jejuni]
gi|ABC26006.1|ARO:3001624|OXA-84 [Acinetobacter_baumannii]
gi|AAF86691.1|ARO:3001816|ACC-2 [Hafnia_alvei]
gi|AFU35065.1|ARO:3003206|lsaE [Staphylococcus_aureus]
gi|AFM38048.1|ARO:3003206|lsaE [Staphylococcus_aureus]
Could it be due to the different structure of the files i.e. TSV and | separated?
I appreciate if someone can help me to get it worked.
Regards Mahdi