Merging Blastx Hits From Overlapping Bacterial Genome Segments
1
2
Entering edit mode
14.7 years ago
Darked89 4.7k

I blastx-ed 1Mbp bacterial genome fragment against NCBI nr database. I have split it into 2000bp fragments with 500bp overlap into a one multiple fasta file (splitter from EMBOSS)

splitter -sequence my_contig.fa  -size 2000 -overlap 500

As on output I picked tabulated blast (-m 9).

Next step was to convert blastx output into gff3. Got that one, with absolute positions (positions in intact contig).

Seems that often one ORF / predicted gene is covered by 2-3 blast hits to the same protein. Hits may or may not overlap. Hence my questions:

  1. what are the fragment sizes / overlaps typically used for blastx in such situation?
  2. are there any advantages of improving blast hits, by say merging overlapping segments (e-scores will be invalid), or by using blast2 (blastx mode) and comparing DNA sequence from region of overlapping/almost-touching hits against already detected protein?
blast gff annotation genome bacteria • 4.1k views
ADD COMMENT
3
Entering edit mode
14.7 years ago

Isn't the size of the protein that causes multiple hits? No matter what fragment size or overlap you choose, if two or more fragments cover different sections of the same protein, you'll get mulitple hits.

If your fragment sizes are too large you'll miss regions, if they are too small you'll get multiple hits. This latter problem does not seem to preclude any downstream analysis, so it may not be worth trying to optimize it away.

ADD COMMENT
0
Entering edit mode

Seems that I am missing hits to some fragments, therefore I will have to go down in fragment size and increase the proportion of the overlap. Average predicted gene size is 274 aa, so I will try 1kb fragments with 500bp overlaps next.

ADD REPLY

Login before adding your answer.

Traffic: 1970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6