We have performed target capture/enrichment for gene regions of interest (baits) and I am now using HybPiper in order to extract target sequences from high-throughput DNA sequencing reads.
Very briefly, after quality trimming and adapter removal with Trimmomatic, reads will be first mapped to the reference genes (bwa mem). Then, the mapped reads will be used to build individual assemblies for each gene (SPAdes). After that, the program Exonerate is used to find the coding sequences. For paralog detection, the program will produce a warning if it detects multiple contigs containing long coding sequences-- by default at least 75% of the reference sequence.
For baits design, we did not filter for single-copy loci since that information is not known for our non-model species. Therefore, we were expecting a high enrichment of paralog sequences in our data. However, after the first run of HybPiper, I did NOT find any paralog warnings at all for the sample that I tried. I have noticed that SPAdes will produce short contigs for many genes (their length will not be 75% of the RefSeq), which might be a consequence of low coverage ~ which we thought it was caused by over-enrichment of paralog sequences.
Does that mean that there are no paralogs in that sample?
Do you think that is a good way of detecting paralogs in enriched data?
My thoughts - please take with a grain of salt. I cannot really comment on your question regarding if that means there are no paralogs in that sample because I am unaware of the target region used to design the baits. I would think that a de novo assembler such as SPAdes would potentially collapse paralogs unless a high enough (but not too high) value of k is used (http://www.rebeccatarvin.com/single-post/2015/06/26/Increase-kmer-size-to-improve-paralog-assembly). Granted the example here is from RNAseq, not target enrichment.