Hi,
I am new to antismash analysis and using the updated/latest version 5.0 therefore can not find answer of my questions in the older threads.
Here is the detail:
I ran antismash and obtained .gbk files as well as a new folder called "region1". (I belie this is the new thing in the latet version). This folder contains several .html files that look like this "ctg3_14_mibig_hits.html" and so on....
When I open this .html file, it contains the following eight columns:
- MIBiG Protein
- Description
- MIBiG Cluster
- MiBiG Product
- % ID
- % Coverage
- BLAST Score
- E-value
The fourth column (MiBiG Product) contains name of the product e.g. NRP, polyketde, tarpene, other etc. and I am interested in counting the number of BGCs types in each sample. (may be from this column?)
Q1. I am confused which file should I use to count the BGC types? This .html file (I have several) under the "region1" folder or .gbk file?
Q2. In either case, I need a method/script to do so. I will really appreciate if someone can please share the code/script for counting the BGCs in each sample since I have several such files and then tens/hundreds of MIBiG product in each file.
Please help this newbie.
Many thanks,
It looks be the that 4th column. See the answer in this SO thread.
You may also be able to simply
cut/sort/uniq/count
that column.Anti-smash HTML output is thoroughly described in their help page.
thank you for your response. I have read antismash output but I am still confused about the output files. So I am still struggling to understand the output. Sorry for my lack of knowledge.
For example, I obtained 116 html files from a single run. Well, I have 116 html files then each html file contains tens of MiBiG Product (4th column) . On average if I have 10 MiBiG product for each html file, its going to 1160 files in total for each run. Should I count "MiBiG Product" (4th column) from all these 1160 files and then add them up to obtain total (NRP, polyketide etc.)?