It is possible to generate those alignments with the pslPretty utility, available from our list of utilities:
http://hgdownload.soe.ucsc.edu/admin/exe
Here is an example where I also illustrate some other useful commands, faSomeRecords and pslSomeRecords, which are also available from the same directory listed above:
# download everything
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
$ wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/all_est.txt.gz
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/est.fa.gz
# format
$ gzip -cd all_est.txt.gz | cut -f2- > all_ext.psl
$ gzip -d est.fa.gz
# small example psl:
$ echo "BX437773" | pslSomeRecords all_ext.psl stdin onePsl.psl
$ echo "BX437773" | faSomeRecords est.fa stdin out.fa
# now run pslPretty
$ pslPretty onePsl.psl hg38.2bit out.fa pretty.out
$ cat pretty.out
>BX437773:0-883 of 897 chr1:11130551+11145019 of 248956422
gcgat-gggt-gggctgttctcgg.....75......cNNtggtggcgttgttctgttgN
||||| |||| ||||||||||||| | ||||||||||| | |
GCGATGGGGTGGGGCTGTTCTCGG.....75......cagtggtggcgTTGGTGATGTTG
cccNgaaNggcctNccgccNatacttcttctc-NttNgcgggcttgNttctgatNtttNt
|| ||| | ||| | ||||||||| | ||||||||| ||||||| ||| |
GCCCCGCTGGCATGACGCAGTTTCTTCTTCTCA--TCGCGGGCTTGGTTCTGATGTTTGT
NgtgtNgccccgattcgaagttcatcactgcccacgcatgccagNc-----2302-----
|||| || | | ||||||||||||||||||||||||||||||| |
AGTGTAGCACAGCTTCGAAGTTCATCACTGCCCACGCATGCCAGGCCTGGTT...GATCA
...
...
...
All 3 utils can be run with no arguments in order to get a usage message:
$ pslPretty
pslPretty - Convert PSL to human-readable output
usage:
pslPretty in.psl target.lst query.lst pretty.out
options:
-axt Save in format like Scott Schwartz's axt format.
Note gaps in both sequences are still allowed in the
output, which not all axt readers will expect.
-dot=N Output a dot every N records.
-long Don't abbreviate long inserts.
-check=fileName Output alignment checks to filename.
It's recommended that the psl file be sorted by target if it contains
multiple targets; otherwise, this will be extremely slow. The target and query
lists can be fasta, 2bit or nib files, or a list of these files, one per line.
If you have further questions about UCSC data or tools feel free to send your question to one of the below mailing lists:
- General questions: genome@soe.ucsc.edu
- Questions involving private data: genome-www@soe.ucsc.edu
- Questions involving mirror sites: genome-mirror@ose.ucsc.edu
ChrisL from the UCSC Genome Browser