Entering edit mode
2.4 years ago
kalavattam
▴
280
To get the QNAME field from the last read in a bam file, I do the following:
samtools view "${bam}" | tail -1 | cut -f1
However, this takes quite a long time for larger bam files (for example, I am working with bam files in the range of 40–50 GB). Does anyone know of a faster and perhaps less resource intensive way to do this?
Thank you. Do you know of any strategies for non-coordinated-sorted bam files?
If these are unsorted files, why would you need the last read? Couldn't you pick a random read or the first one?
Thanks, they are queryname-sorted files.
how is it related to your original question ?
furthermore you re-invented https://gatk.broadinstitute.org/hc/en-us/articles/360036882611-FilterSamReads-Picard- +
READ_LIST_FILE
Thank you and apologies. To keep the discussion on point, I removed the non-relevant information.
Answering the question from Friederike: I use the queryname of the last read to break a while loop in a script that filters a bam file to exclude reads with querynames that match those in a user-supplied list of querynames.