Hi, community. I have been working in a transcriptome for my species of interest which has an available genome. To increase my transcriptomic database, I decided, after assembling a genome-guided transcriptome, to assemble a de novo genome using the reads that did not map (around 10%~ of my data). However, I suspected that I had contamination. Indeed I mapped my dataset of non-aligning reads to several sequences (from human, Fungi, viral and bacterial), and for the Bacterial genome (E.coli) around 30% of the reads that did not map to my genome mapped to this bacterial genome. Since now I know my source of contamination is probably bacterial, I was wondering if there is any database I can use to map and remove the contaminants reads
Thank you in advance
So it's enough to just use one bacterial genome and relax the parameters with my aligner? I've been using Hisat2 with default parameters like so:
Since a lot of the reads I have mapped to the genome I'm sure a lot of interesting results will come up. However, we want to build a more complete transcriptome to be used in future studies.
It is easy for me to say this so apologies in advance but you will be best served by making additional libraries (perhaps from different life cycle stages/organs etc) rather than going after this small fraction of reads that did not map to your genome in first place.