Pretty basic question.
I have bam files in which the only information that indicates whether an alignment is unique is it's NH field. I'm interested in keeping only unique alignments, hence only alignments with NH:i:1 field. Can samtools do this? If not is there any other efficient way other than using grep?
Thanks
Grep is pretty darn efficient. I can't think of a faster way to look at each read than to look at each read. Fundamentally any alternative would do the same.
Would grep really be as efficient as using samtools view if that was an option?
I meant to run
samtools view
and pipe throughgrep
. If samtools has a filter flag for your tag of interest that would be a tiny bit faster. If your tag of interest is obscure then you've got to grep it. grep has a non-regex fixed mode that is very fast, but nothing is going to read a BAM faster thansamtools view
.let's assume my aligner writes the sam output to standard output. Is there a one-liner that would pipe and grep and this directly into a bam file (including the header)?
For example, if to pipe my alignment output to bam I use:
How can I change it to pipe only alignments with NH = 1?
This should work, but there are probably faster ways...
Thanks thackl. It is indeed pretty slow. If anyone is following and knows a samtools command which is faster for achieving this that'll be great.
If you want bam2bam the following grep is about 10x faster than my previous idea. Roughly 1 minute / GB bam on my system. Depending on what type of alignments are in your bam, the samtools filter flags (no unmapped, only primary, no supplement - these cannot be unique mappings anyway) might speed up things as well.
Pardon my ignorance in linux. Is the full command you're suggesting this: