Mitochondrial contig identification in fungal genome assemblies
3
1
Entering edit mode
16 months ago

Dear Community,

I would appreciate your comments and suggestions on steps to follow for identification of mitochondrial genome in draft genome assemblies.

I have an assembly in which I observed a circularized contig. Then I did an online blast and found a mitochondrial hit for the same 62 % query coverage and 96% identity. Shall I assume this contig could be a mitochondrial genome ? But at the same time I couldn't use online blast for my other contigs as they were too big to accept by online server. Also I observed that local blast with ncbi nt was not an appropriate option which takes huge amount of time. Is there any alternate methods to explore the draft assembly and its identities ?

Appreciate your comments on the steps to follow.

genome assembly fungi mitochondria • 1.3k views
ADD COMMENT
0
Entering edit mode

Dear Philipp Bayer (Philipp Bayer), Mikhail Schelkunov and Brian Bushnell. All solutions certainly helped a lot ! Thank you for your time and consideration.

ADD REPLY
2
Entering edit mode
14 months ago

Along with the solutions tried here, you might also want to give these tools a try:

https://github.com/ibe-uw/tiara - Tiara will tell you whether contigs are archaea, bacteria, prokarya, eukarya, organelle (mito or plastid)

You could run the contig through a mitochondrial genome annotation pipeline and see whether the contig contains all mitochondrial genes. Try uploading the contig to https://mitos.bioinf.uni-leipzig.de/

ADD COMMENT
1
Entering edit mode
14 months ago

Typically mito contigs can be detected via coverage. Starting with adapter-trimmed Illumina reads, we use this specific script to assemble fungal mitochondria, using BBTools:

#First link reads as reads.fq.gz

kmercountexact.sh in=reads.fq.gz khist=khist_raw.txt peaks=peaks_raw.txt

primary=`grep "haploid_fold_coverage" peaks_raw.txt | sed "s/^.*\t//g"`
cutoff=$(( $primary * 3 ))

bbnorm.sh in=reads.fq.gz out=highpass.fq.gz pigz passes=1 bits=16 min=$cutoff target=9999999
reformat.sh in=highpass.fq.gz out=highpass_gc.fq.gz maxgc=0.45

kmercountexact.sh in=highpass_gc.fq.gz khist=khist_100.txt k=100 peaks=peaks_100.txt smooth ow smoothradius=1 maxradius=1000 progressivemult=1.06 maxpeaks=16 prefilter=2

mitopeak=`grep "main_peak" peaks_100.txt | sed "s/^.*\t//g"`

upper=$((mitopeak * 6 / 3))
lower=$((mitopeak * 3 / 7))
mcs=$((mitopeak * 3 / 4))
mincov=$((mitopeak * 2 / 3))

tadwrapper.sh in=highpass_gc.fq.gz out=contigs_intermediate_%.fa k=78,100,120 outfinal=contigs_intermediate.fa prefilter=2 mincr=$lower maxcr=$upper mcs=$mcs mincov=$mincov

bbduk.sh in=highpass.fq.gz ref=contigs_intermediate.fa outm=bbd005.fq.gz k=31 mm=f mkf=0.05

tadpole.sh in=bbd005.fq.gz out=contigs_bbd.fa prefilter=2 mincr=$((mitopeak * 3 / 8)) maxcr=$((upper * 2)) mcs=$mcs mincov=$mincov k=100 bm1=6

ln -s contigs_bbd.fa contigs.fa

Once it's assembled, if it came out in multiple contigs, you might get a better assembly by using Spades on the recruited reads.

ADD COMMENT
0
Entering edit mode
16 months ago
shelkmike ★ 1.4k

To find mitochondrial contigs in an assembly, I do the following:
1) I align several mitochondrial genomes of close species to the assembly using Blastn.
2) Using a custom script I find contigs with more than 50% of sequence (in total) covered by matches to the reference mitochondrial genomes. I consider these contigs belonging to the mitochondrial genome.

ADD COMMENT

Login before adding your answer.

Traffic: 1761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6