Entering edit mode
10 months ago
Bioinformatics_begginner
▴
20
Hi,
I am classifying Phages according to their taxonomy using https://github.com/KennthShang/PhaGCN ,I am having the issue that my fast afiles have fractions of the genome instead of the whole genome:
My fasta files have the format Genome1.fasta:
>k141_291006
TCG...
>k141_386008
TCG....
The PhaGCN classifies this phage genome as:
k141_291006,19687,Casjensviridae,0.17071722
k141_386008,108404,Herelleviridae,1.0
So it gives two different classifications (yes one has probability 1.0 in other cases there isn't one with higher probability). The examples for this tool use whole genomes and classify the genome.
- Can I just concat the sequences and classify as an entire genome?
- Should I align the against each other using Multiple Sequence Alignment? I though to align against reference but it since I do not know to which family they belong finding an accurate reference genome is hard and not robust method.
- Should I classify according to the most probable classification and if they are all very similar like 0.5,0.4 do a consensus?
Best Regards and Thank You