Question

Removal of host sequences without reference genome

0

Entering edit mode

3.4 years ago

emiliomastriani ▴ 40

Dear all, Suppose to have a collection of viral reads from NGS (Illumina) technology in fastq format. After the usual pre-processing step (addressed by fastp), I need to remove the host sequences (contaminants) without having the reference genome (I cannot use bowtie2 and samtools for mapping, of course). I have ready some approaches, but I am still not sure. Please, can someone suggest an appropriate strategy/starting point/approach? Thanks for your support.

meta-genomic contaminants missing host reference genome viral • 1.5k views

ADD COMMENT • link 3.4 years ago by emiliomastriani ▴ 40

score 0 · Answer 1 · 2021-08-20

0

Entering edit mode

3.4 years ago

Mensur Dlakic ★ 28k

You would help everyone by providing more details: what is the host, how big a genome vs. the viral genome, what is it you are considering, etc.

I have answered this question in several different contexts, so I will just give you links. I think it is worth reading all the posts in those pages.

ADD COMMENT • link 3.4 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Hello, The goal of my project is to identify the correct taxonomy of the viral reads I have. We always know the host of our sample even if the reference genome is not available, like bat, rodents, human, or mosquito.

ADD REPLY • link 3.4 years ago by emiliomastriani ▴ 40