Question

Allele Calling with chewBBaCA

0

Entering edit mode

2.7 years ago

davidmaimoun ▴ 50

Hi dears,

I have 5 strains of listeria. I need to do Allele Calling and draw a tree (phylogenetics and spanning). for the calling, I used chewbbaca: I ran the contigs.fasta I got from spades, but the output doesn't look well.

see the results below

What input chewbbaca needs? Do I need to do annotation first? And could I generate a tree with phyloviz from the output?

Thank you very much!

wgmlst cgmlst mlst chewbbaca allele calling • 1.3k views

ADD COMMENT • link updated 2.7 years ago by patrickdm ▴ 240 • written 2.7 years ago by davidmaimoun ▴ 50

0

Entering edit mode

2.7 years ago

davidmaimoun ▴ 50

Thanks a lot Patrick, it's very helpful!!

Have an excellent day!

ADD COMMENT • link 2.7 years ago by davidmaimoun ▴ 50

score 2 · Accepted Answer · 2022-04-14

2

Entering edit mode

2.7 years ago

patrickdm ▴ 240

Your output looks good to me. You will find many allele calls that are simple integer values, other as INF-X (which stands for inferred allele X) and more calls like LNF, NIPH, NIPHEM, PLOT3, PLOT5, ALM, ASM, LOTSC. More info on that in the docs about allele calling.

By using the command (chewBBACA.py) ExtractCgMLST on your results_alleles.tsv, with parameter --t 0 (t: Maximum exclusion threshold), you should obtain a transformed table, with all the missing data turned into 0, and all the INF-X into the corresponding X allele. Then you could use Phyloviz to produce a tree from that.

I wrote a little python script, mlst2dist, which performs the calls transformations and computes pairwise Hamming distances -modified with correction for missing data- to produce dissimilarities matrices in PHYLIP and MEGA formats.

Hth.

ADD COMMENT • link 2.7 years ago by patrickdm ▴ 240

0

Entering edit mode

Is it fine to run as input the contigs (that I got after running SPAdes - assembly-base allele calling)? And in the case I want to run also chewbbaca on raw reads (assembly-free allele calling), how can I get fasta files from end 1 and end 2 trimmed fastq. (What I did is split the fastq paires-ends file with sratools, did trimming - I got 2 trimmed fastq files, one for each end. From that I can't go on, chewbbaca needs Fasta file for each strain)

I hope it is not too confuse? Because I am so confuse myself that it is difficult for me to express things well

Thank you for your help

ADD REPLY • link 2.7 years ago by davidmaimoun ▴ 50

1

Entering edit mode

(quoting the wiki docs) "In chewBBACA, schemas are composed of loci defined by CDSs and all the called alleles of a given locus are CDSs as defined by Prodigal". So it is fine to use a set of assembled genomes as input; I can't see how Prodigal could identify complete CDS on the unassembled reads. Also I'd suggest to take a look at the docs about using a Prodigal training file for your dataset and use it in the downstream steps of schema creation and allele calling. Because you are working with Listeria, I'd also point you to the existing L.monocytogenes.trn file in the prodigal_training_files repo dir. Hth