Question

Classify Phage Taxonomy Scaffold Data (Genome Fractions)

0

Entering edit mode

15 months ago

Bioinformatics_begginner ▴ 20

Hi,

I am classifying Phages according to their taxonomy using https://github.com/KennthShang/PhaGCN ,I am having the issue that my fast afiles have fractions of the genome instead of the whole genome:

My fasta files have the format Genome1.fasta:

>k141_291006
TCG...
>k141_386008
TCG....

The PhaGCN classifies this phage genome as:

k141_291006,19687,Casjensviridae,0.17071722 
k141_386008,108404,Herelleviridae,1.0

So it gives two different classifications (yes one has probability 1.0 in other cases there isn't one with higher probability). The examples for this tool use whole genomes and classify the genome.

Can I just concat the sequences and classify as an entire genome?
Should I align the against each other using Multiple Sequence Alignment? I though to align against reference but it since I do not know to which family they belong finding an accurate reference genome is hard and not robust method.
Should I classify according to the most probable classification and if they are all very similar like 0.5,0.4 do a consensus?

Best Regards and Thank You

Phages PhaBOX Bacteria Genome PhaGCN • 352 views

ADD COMMENT • link 15 months ago by Bioinformatics_begginner ▴ 20