I have 88 metagenomes, assembled all individually, dereplicated contigs, and now I want to map 91 metatranscriptomes to this very large reference to see what doesn't map. These reads that don't map, I want to save into separate fastq files. I'm dealing with bacteria so there are no splicing events happening.
What tool would you suggest to do the following:
Build the index
Map the metaT to the large metaG reference
Convert the sam to fastq for only paired sequences that do not map to reference
My original plan was to use BWA for the mapping, pipe to bbmap's (or bbsuite? or bbtools?) reformat.sh program to convert from sam, only get unmapped reads, and output fastq. The BWA index took about 7 hours to make last time I tried this (before I realized there were a lot of duplicate sequences). I'm going to try this again later today and I'm wondering if I should try BWA or Bowtie2 or maybe something else.
Any help or feedback is appreciated.
I'll give this a try. A few questions regarding bbmap:
BBTools
refers to the entire suite.bbmap
then specify that top level folder (which should containref
folder) usingpath=
. If you are starting with a fasta reference file thenref=
is correct option, which builds the index on the fly.Got it! That's what I was missing as I wasn't using the
ref=
andpath=
option correctly