I am just wondering for a suitable approach to filter "contamination" from the genome of know HGT nature? Any ideas and approaches are welcome.
Thanks for your favor.
I am just wondering for a suitable approach to filter "contamination" from the genome of know HGT nature? Any ideas and approaches are welcome.
Thanks for your favor.
The discussion about contamination vs. HGT is still in flux. You might want to have a look at the following interesting case in PNAS on real or false HGT in tardigrades:
In my opinion the evidence has shifted towards no HGT and towards contamination in this case.
In the following commentary, some guidelines for distinguishing contamination from HGT have been given:
I suppose you are talking about DNA-seq data, right ?
In the case of a contamination, you can expect a somehow uniform distribution of the contaminant reads on their genome while if there is a true HGT event, then only one or a few extra genes will be represented.
A possible approach to distinguish both cases is to :
First of all, it it more likely to see contamination than true HTG.
If it is contamination, you have to discern two cases: 1) contamination happened before preparation of the sequencing library, and 2) contamination after library preparation (your library was cross contaminated with another library likely during loading of the sequencer). In both cases there are some characteristic features. If it is contamination with fast growing bacteria, you should especially see the high copy number genes of the contaminating bacterium, which are rRNA genes and plasmidic genes. In case of contamination with another library, that library may have an insert size distinct from your main library, see Plant viruses sequences are found in human brain Rna-seq sample: how to evaluate it?
You can identify true HTG if you find a chimeric contig, where a stretch of foreign DNA sequence is inserted into your target genome and is flanked by sequences of your target organism at BOTH ends. The whole contig should show a rather uniform read coverage, especially at the insertion sites.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.