Entering edit mode
9.4 years ago
bdeonovic
▴
210
It is always confusing to deal with BLAT/PSL format with regards to coordinate conventions. I find the following links helpful:
- Blat Start And End Position Conventions
- Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems
- http://genome.ucsc.edu/FAQ/FAQformat.html#format2
It is now clear to me that dealing with the query coordinates one has to be careful with regard to the strand.
What about the target sequence? Is it always with respect to the forward strand?
EXAMPLE: I made this up to try to understand how BLAT/PSL works
target: ATCGATCGATCGATCGATCGATCGA
query1: ATGATCCAT
query2: GATCGTTGATCGAG -> CTCGATCAACGATC (rev comp)
the forward strand of query1 aligns to target
0123456789012345678901234
ATCGATCGATCGATCGATCGATCGA
___GAT______________AT___
012345678
ATGATCCAT
__+++__++
5 0 0 0 0 0 0 0 + query1 9 2 9 target 25 3 22 2 3,2 2,7 3,20
the reverse strand of query2 aligns to target
0123456789012345678901234
ATCGATCGATCGATCGATCGATCGA
__CGATC___CGATC__________
Forward:
01234567890123
GATCGTTGATCGAG
-----__-----__
Reverse complement:
01234567890123
CTCGATCAACGATC
__+++++__+++++
10 0 0 0 0 0 0 0 - query2 14 2 14 seq 25 2 15 2 5,5 0,7 2,10
Copy from http://genome.ucsc.edu/FAQ/FAQformat.html#format2
Be aware that the coordinates for a negative strand in a dna query PSL line are handled in a special way. In the qStart and qEnd fields, the coordinates indicate the position where the query matches from the point of view of the forward strand, even when the match is on the reverse strand. However, in the qStarts list, the coordinates are reversed. Example: Here is a 61-mer containing 2 blocks that align on the minus strand and 2 blocks that align on the plus strand (this sometimes happens due to assembly errors): 0 1 2 3 4 5 6 tens position in query
0123456789012345678901234567890123456789012345678901234567890 ones position in query
++++++++++++++ +++++ plus strand alignment on query
------------------ -------------------- minus strand alignment on query
0987654321098765432109876543210987654321098765432109876543210 ones position in query negative strand coordinates 6 5 4 3 2 1 0 tens position in query negative strand coordinates
Plus strand:
qStart=22 qEnd=61 blockSizes=14,5 qStarts=22,56
Minus strand:
qStart=4 qEnd=56 blockSizes=20,18 qStarts=5,39
Essentially, the minus strand blockSizes and qStarts are what you would get if you reverse-complemented the query. However, the qStart and qEnd are not reversed. Use the following formulas to convert one to the other: Negative-strand-coordinate-qStart = qSize - qEnd = 61 - 56 = 5 Negative-strand-coordinate-qEnd = qSize - qStart = 61 - 4 = 57