Question

How To Find The Best Hit From Output.Psl

3

Entering edit mode

11.7 years ago

alok.helix ▴ 120

After running a standalone blat on my linux terminal now I wish to find the best hit from the output.psl file which is generated!

I have gone through pslReps and seen the filtration criteria in the following link http://dnaresearch.oxfordjournals.org/content/early/2013/11/25/dnares.dst049.full but I found less clarity in how the parameter are set!

Please guide me in doing the needful!

blat alignment • 7.0k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 11.7 years ago by alok.helix ▴ 120

score 4 · Answer 1 · 2013-12-05

Is the output tabular with the following fields:

MATCHES - Number of non-repeat matches.
MISMATCHES - Number of mismatches.
REPMATCHES - Number of repeat matches.
NCOUNT - Number of Ns.
QNUMINSERT - Number of inserts in query.
QBASEINSERT - Number of bases inserted in query.
SNUMINSERT - Number of inserts in subject.
SBASEINSERT - Number of bases inserted in subject.
STRAND - Strand.
Q_ID - Query ID.
Q_LEN - Query length.
Q_BEG - Query begin.
Q_END - Query end.
S_ID - Subject ID.
S_LEN - Subject length.
S_BEG - Subject begin.
S_END - Subject end.
BLOCKCOUNT - Block count.
BLOCKSIZES - Block sizes.
Q_BEGS - Query sequence blocks begins.
S_BEGS - Subject sequence blocks begins.

If yes, you would probably begin by sorting with Query IDs, i.e.
sort -k10,10
sort -k10,10 -k2,2g ..would sort queries and then within queries mismatches from least to most
sort -k10,10 -k2,2g output.psl | sort -u -k10,10 --merge > bestHitsWithThisCriteria.psl ..would give you a file with "best hits" with this criteria, but really, you have to decide what makes a best hit.

score 2 · Answer 2 · 2013-12-05

2

Entering edit mode

11.7 years ago

Prakki Rama ★ 2.7k

You can get the blat output in -out=blast8 NCBI blast tabular format and take the best hit based on highest bit score

ADD COMMENT • link 11.7 years ago by Prakki Rama ★ 2.7k

GenoMax · Answer 3 · 2017-02-17

Blat Score from biopython/Bio/SearchIO/BlatIO.py :

def _calc_score(psl, is_protein):
    # calculates score
    # adapted from http://genome.ucsc.edu/FAQ/FAQblat.html#blat4
    size_mul = 3 if is_protein else 1
    return size_mul * (psl['matches'] + (psl['repmatches'] >> 1)) - \
            size_mul * psl['mismatches'] - psl['qnuminsert'] - psl['tnuminsert']

in short, when size_mul = 1 :

match_score=(float(x[0])+float(x[2]))-float(x[1])-float(x[4])-float(x[6]) #from psl

I'd say that score displayed in IGV is 1000-match_score