can someone point to a detailed explanation of how UCSC genome browser interprets coordinates for, and displays, minus strand sequences? It's very confusing. For example this genomic GFF coordinate from mm9 genome:
chr1:3206080-3206102:-
Pulls out the sequence:
>seq1
AATACAAGGAGCCGCAATGTCCA
which is correct. However, in the genome browser it appears as the reverse complement going from right to left
>seq2
TGGACATTGCGGCTCCTTGTATT
Questions are:
- why does it UCSC reverse complement it and display it right to left in this way? What is the logic behind this? Is there a way to reverse it, such that the sequence displayed is the sequence pulled from the genome (i.e.
seq1
) - I thought UCSC is always BED coordinate based, which is a 0-based coordinate system not 1-based. In that case I would have expected the sequence determined by chr1:3206080-3206102 (meant to be a GFF coordinate) to be one base shorter than the sequence I get from UCSC, i.e. chr1:3206080-3206102:- in BED should have a minus 1 start, yielding chr1:3206079-3206102:-
any intuitive explanation of how to think about UCSC orientation display choices and how that connects to sequences and their reverse complement on minus strand (and where/when BED versus GFF conventions are used) would be very helpful.
I found the button but it's still confusing for the task I'm trying to accomplish. I want to look for presence of a motif in a genomic region (say 5' utr). For minus strand, I want to detect the motif in the correct orientation and make a BED track loaded into the UCSC browser that will show where the motif is. Normally when I pull minus strand genomic sequences from genome I use bedtools, and I tell it to reverse complement the minus strand sequences. then I look for motif. I want that to match up with what UCSC shows. So I guess I should load the BED track and then reverse the display?