I've read the SAM/BAMspecification but I wonder what is the purpose of the fields RNEXT and PNEXT ?
are they only used with paired end data to check the size of the fragment ? if no, are those fields updated when a BAM file is sorted and when are they used in samtools or some other programs ?
One use of RNEXT and PNEXT is to know the reference and position of a paired end read's partner for visualisation tools. This has now been generalised in the spec to cope with multiple reads per template (e.g. strobe reads).
Sorting should not affect the values of RNEXT and PNEXT. Note that as a hack, when only one of a pair of reads is mapped, I believe the partner is given the same reference name (and position?) but marked as unmapped. This is to ensure if sorted by position, both reads are located next to each other in the file.
Isn't this a perfect question for the samtools-help mailing list?
for single end reads, could it mean the next segment if the original read is in a split alignment, i.e. different segments of the read maps to different locations of the same chromosome?
Given a sorted and indexed BAM file, RNEXT/PNEXT allow a program to find the mate of a given read (or more generally since BAM 1.4, other "fragments" of the same "template").
So those fields are only useful for paired-end/mate reads isn't it ?
Yes - for single end reads (or whatever you call non-paired end), RNEXT and PNEXT are irrelevant.
for single end reads, could it mean the next segment if the original read is in a split alignment, i.e. different segments of the read maps to different locations of the same chromosome?