Hi,
I have PE sequenced a batch of ticks attached and fed on mammals for which I am trying to assemble and classify virus genomes and also detect non-viral pathogens, mainly bacteria. To begin with this, after trimming and deduplication, I would like to filter out reads from the tick genomes before de novo assembly.
Because my dataset contains several species of ticks (hosts), my main question is how to automate the host read removal from my data. Should I compile a list of host genomes and map each sample to the whole list or the data should be grouped according to the tick species?
My other question is how to set parameters for detection of virus reads derived from endogenous elements, e.g. min. >50% query cover and >80% identity are common thresholds used to filter our reads associated with viral elements integrated into host genomes.
many thanks
Thank you very much for your reply