Hi, guys
I'm processing NGS data and have a question.
I need to make my data have over 100,000,000 reads, so when my first processing is done, I check if they are good to go. When the bam files are not over 100,000,000 reads, I sequence those libraries which are more needed.
Here are the questions. 1. If I suppose my library, sample, sequencing machine and everything is the exactly the same, are the bam file which is merged after mapping and pre-merge fastq file, then mapped bam file same??
- And if they are same, can I merge bam files using samtools or sambamba??
Thank you very much.
Woongjae
I guess in an RNAseq-setting this does not hold true, since some aligners have a threshold for junction detection. If you have split files, you'll may miss junctions. The merged BAM is still missing these junctions whereas the mapping of the total reads' set find those and store it in the BAM file.
For DNAseq, I agree.
If you're doing something with 2-pass then yes, you could theoretically miss something. Given the numbers getting tossed around by OP I suspect that's not the case.
Thank you for the replies guys.
So you mean I can either merge fastq files first and then process the mapping or process mapping first for the additional fastq file and then merge the bam file with existing bam file, right??
Right. Some things, like looking for novel splice junctions, work better if you align everything in a single go (so merge the fastq files). For most other things it doesn't much matter if you merge fastq or BAM files, you get more or less the same result either way.
Thank you Ryan.
P.S : I'm learning a lot from your other issues's replies!