Genome Contamination Analysis
2
0
Entering edit mode
4 days ago
Umer ▴ 160

Hello,

I have 12 De-novo assembled fungal genome assemblies.

Background:

  1. Assemblies are generated from QC-performed Nanopore data.
  2. Later polished via Racon and Pilon using illumina data.
  3. Contigs < 2000bp removed and assembly sorted accouding to size using funannotate clean and sort

I want to perfrom contamination analysis. So i checked the NCBI Foreign Contamination Screen (FCS) tool. But it required alot of computational resources.

My question:

  1. Is contamination analysis necessary? (kindly share your thoughts)
  2. Which tools or pipelines are good for contamination analysis. (if you can direct me toward tutorials)

Thank you. Happy New Year.

contamination genome assembly analysis • 802 views
ADD COMMENT
1
Entering edit mode

Is contamination analysis necessary?

I would say no. There should be no contamination if the experimental component of study was rigorously done. There is little chance you added contamination during the analysis phase. If there is possibility of contamination in your data then that should have been addressed before you started assembling the data.

ADD REPLY
0
Entering edit mode

Thank you for clarification.

One other question a bit unrelated. Everytime i post a question i wait for your response. I personally believe that you have more knowledge and experience in genome assemblies. Is there any way i can follow and look at your work. may be os other platforms like Google scholar or anywhere else. If you feel ok in sharing.

ADD REPLY
1
Entering edit mode

I agree that generally you shouldn't need to do a contamination analysis. I added one into a pipeline for my last lab because the quality of samples we received from partners varied a lot. So, if you notice your assemblies are more fragmented than anticipated or find unexplained patterns in data then it can be useful to try and clean up that mess.

I liked using metagenomic tools like Kraken2 for contamination.

ADD REPLY
3
Entering edit mode
3 days ago

Fungi can be prone to misidentification.

I have found sourmash useful and easy for fast genomic comparison of bacteria and fungi.

https://sourmash.readthedocs.io/en/latest/tutorials-lca.html

If you are concerned about internal contamination (by what ? E. coli ? Other fungi ? ), then an ORF level approach might be most effective.

  • Call all ORFs using eg augustus
  • blastn of ORFs vs a large downloaded NCBI dataset (all fungi for example)
  • check hits are as expected and not from vastly different species kingdoms
ADD COMMENT
1
Entering edit mode
4 days ago
Mensur Dlakic ★ 28k

What you are asking is similar to is car insurance necessary? The answer is the same: it is not necessary most of the time, but people still pay for it. We don't know how you isolated DNA for these analyses and what the contamination potential is, but at the very least there is a possibility for human DNA contamination.

In my book, skipping this because it required alot of computational resources is not a good enough reason. But if you absolutely can't do the proper contamination analysis, you can at least try to bin the total DNA using tetranucleotide frequencies. This won't separate very similar genomes, but it will work in a pinch if you have contamination with prokaryotes, and it should be able to separate human and fungal DNA as well. I did an exercise a couple of years ago by adding 2-3 bacterial genomes to human chromosomal DNA. They cleanly separate into distinct groups by doing t-SNE from tetranucleotide frequencies, and I am sure UMAP or other dimensionality reduction methods would work as well. I am suggesting this only as a low-resource substitute for proper analysis if you absolutely can't gather resources for FCS.

ADD COMMENT
1
Entering edit mode

Along the same line as what Mensur Dlakic proposes is to look at the average %GC content per sequence. Plotting that will also allow you to quickly inspect whether there might be contamination (especially euk vs pro will work well ). Keep in mind though that this is also a quite crude (but easy/quick) approach that in no way competes with a real FCS analysis.

EDIT: the above is simply an approach you could use. For sure don't take it as an obligation to run such an analysis!! In most cases or unless you have complicated samples this will not be necessary as contamination will be low to non-existing.

ADD REPLY

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6