Hi
I am too naive here, sorry first for trivial question. I want to make reference table so that I can annotate the antibiotic resistance gene hits with antibiotic resistance gene category name by using the following commands:
1)making a reference database for annotating the aro group numbers with the antibiotic resistance groups
cat ./aro.obo | tr "\n" "@" | sed 's/@@/\n/g' | grep -v format-version | grep -v Typedef | sed 's/\[Term\]@id\:\s//g' | sed 's/@.*@is_a/\tis_a/' | grep is_a | sed 's/@relationship.*//' | sed 's/is_a.*\!\s//' | sed 's/ /_/g' > ./ARO_numbers_and_AR_groups.tsv
2)Get a list of ARO numbers with their corresponding gene ID numbers and taxonomic associations from fasta 3)The fasta is annotated as a heirarchy so all ARO numbers should be taken
grep '>' AR-polypeptides.fa | sed 's/>//' | sed 's/ARO:1000001//g' |sed 's/\s.*ARO/\tARO/' | sed 's/\ .*\[/\t[/' | sed 's/ /_/g' > ./gene_IDs_and_ARO_numbers_and_AR_groups.tsv
4)Next, merge the files (using awk) into a single reference database
awk 'FNR==NR { a[$1]=$2; next } $2 in a { print a[$2]"\t"$1"\t"$2"\t"$3 }' ./ARO_numbers_and_AR_groups.tsv ./gene_IDs_and_ARO_numbers_and_AR_groups.tsv > ./CARD_annotation_reference.tsv
While I can produce the two outputs from the first and second command, the awk part does not give any output. here are some lines from the first and the second output.
./gene_IDs_and_ARO_numbers_and_AR_groups.tsv
ARO:0000000 antibiotic_molecule
ARO:0000001 antibiotic_molecule@synonym:_"quinolone"_EXACT_[]
ARO:0000002 tetracycline_resistance_gene
ARO:0000003 aminoglycoside@synonym:_"Astromicina"_EXACT_[]@synonym:_"Astromicine
./gene_IDs_and_ARO_numbers_and_AR_groups.tsv
gi|AAA76822.1|ARO:3002654|APH(3')-VIIa [Campylobacter_jejuni]
gi|ABC26006.1|ARO:3001624|OXA-84 [Acinetobacter_baumannii]
gi|AAF86691.1|ARO:3001816|ACC-2 [Hafnia_alvei]
gi|AFU35065.1|ARO:3003206|lsaE [Staphylococcus_aureus]
gi|AFM38048.1|ARO:3003206|lsaE [Staphylococcus_aureus]
Could it be due to the different structure of the files i.e. TSV and | separated?
I appreciate if someone can help me to get it worked.
Regards Mahdi