Quick way to get a field such as QNAME from the last read in a bam file?
1
0
Entering edit mode
2.4 years ago
kalavattam ▴ 280

To get the QNAME field from the last read in a bam file, I do the following:

samtools view "${bam}" | tail -1 | cut -f1

However, this takes quite a long time for larger bam files (for example, I am working with bam files in the range of 40–50 GB). Does anyone know of a faster and perhaps less resource intensive way to do this?

bam samtools • 1.6k views
ADD COMMENT
2
Entering edit mode
2.4 years ago

get the last chromosome of the bam using samtools idxstat in.bam and then use the index:

samtools view in.bam <NAME_OF_LAST_CHROM>| tail -1

ADD COMMENT
0
Entering edit mode

Thank you. Do you know of any strategies for non-coordinated-sorted bam files?

ADD REPLY
0
Entering edit mode

If these are unsorted files, why would you need the last read? Couldn't you pick a random read or the first one?

ADD REPLY
0
Entering edit mode

Thanks, they are queryname-sorted files.

ADD REPLY
0
Entering edit mode

I wrote a small Python script that uses pysam to write a bam outfile from a bam infile (queryname-sorted) filtered to exclude all reads in a txt list of querynames (reverse queryname-sorted). That is, if a given queryname is in the list, then it is not written to the bam outfile.

how is it related to your original question ?

Quick way to get a field such as QNAME from the last read in a bam file?

furthermore you re-invented https://gatk.broadinstitute.org/hc/en-us/articles/360036882611-FilterSamReads-Picard- + READ_LIST_FILE

ADD REPLY
0
Entering edit mode

Thank you and apologies. To keep the discussion on point, I removed the non-relevant information.

Answering the question from Friederike: I use the queryname of the last read to break a while loop in a script that filters a bam file to exclude reads with querynames that match those in a user-supplied list of querynames.

ADD REPLY

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6