I have two simple questions regarding mapping metagenomics samples to multiple reference. Sorry that they are too basic,
1) From what I have seen in different alignment tools (bwa, soap, mosaik ...) the argument for reference ask for a single fasta file; I wonder, how should one feed these tools when the reference are multiple organisms ? (BTW, which alignment do you recommend for bacteria genomes ?!)
2) for each sample, I have a set of pair-end reads as well as single reads corresponding to different sequencing runs. again, since above-mentioned tools either ask for one single-end read or two mating pairs, what should be my input ?! should I a) pull all reads into 1 huge fasta files ? b) pull all forwards and reverse into 2 big forward.fq and reverse.fq file and then map ?(how about single reads? c) should I run each pair of reads separately and then combine the BAM File afterward ?!
Thanks and sorry for trivial questions
If I concatenate references into a big file, would I have this information in my final alignment ? can you also point me to the BBSplit publication ? Thanks.
BBSplit is not published, but the usage is described in this thread.
If you merge references together, no information is lost as long as all of the sequences have unique names.
Thanks, does it use a global alignment (Needleman-Wunsh) or a local alignment ? Do you think, it would make a difference in case of bacterial genome ?
BBSplit uses global alignments, and yes, it does make a difference - but as for which is better, that depends on the specifics of the situation. I generally favor global alignments but neither is universally better.