What is the best short reader to very large reference fasta?
1
0
Entering edit mode
3.5 years ago
O.rka ▴ 740

I have 88 metagenomes, assembled all individually, dereplicated contigs, and now I want to map 91 metatranscriptomes to this very large reference to see what doesn't map. These reads that don't map, I want to save into separate fastq files. I'm dealing with bacteria so there are no splicing events happening.

What tool would you suggest to do the following:

  • Build the index

  • Map the metaT to the large metaG reference

  • Convert the sam to fastq for only paired sequences that do not map to reference

My original plan was to use BWA for the mapping, pipe to bbmap's (or bbsuite? or bbtools?) reformat.sh program to convert from sam, only get unmapped reads, and output fastq. The BWA index took about 7 hours to make last time I tried this (before I realized there were a lot of duplicate sequences). I'm going to try this again later today and I'm wondering if I should try BWA or Bowtie2 or maybe something else.

Any help or feedback is appreciated.

mapping alignment fastq bowtie2 bwa read • 812 views
ADD COMMENT
0
Entering edit mode
3.5 years ago
GenoMax 148k

You can simply use bbmap.sh to do the alignments (it is as capable as any aligner out there) and then save reads that don't map using outu= (without writing the alignments, if you don't want to save them i.e. don't use out=). No need to do any conversions. I think outu1= and outu2= should keep the reads in separate files if you have paired-end data.

ADD COMMENT
0
Entering edit mode

I'll give this a try. A few questions regarding bbmap:

  1. How should I refer to the collection of programs that are installed (I never know)? Is it bbtools?
  2. I get a little confused with specifying a prebuilt reference index. Other aligners usually add a suffix to the input reference while bbmap uses a separate folder. Do I just specify that folder?
ADD REPLY
1
Entering edit mode
  1. BBTools refers to the entire suite.
  2. If you have a pre-built reference for bbmap then specify that top level folder (which should contain ref folder) using path=. If you are starting with a fasta reference file then ref= is correct option, which builds the index on the fly.
ADD REPLY
0
Entering edit mode

Got it! That's what I was missing as I wasn't using the ref= and path= option correctly

ADD REPLY

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6