Determining Reading Frame from Nucleotide Blast tabular results
1
0
Entering edit mode
10.0 years ago
zgayk ▴ 90

Hello,

I have a database of blast matches to assembly scaffolds in the format of a 24 column tabular file. Included in the results are columns for query and subject frames. However, all of the reading frame results came back as +1, even though this does not seem to be the case when I selectively blast particular results from the file against NCBI blast. Does anyone know why the reading frame results are essentially useless in this regard?

In addition, I would like to get information on whether each match was + or -, as it seems that a number of query sequences are template strands and I may need to get the reverse compliment to be able to use the blast information in analyses.

The data file is very large, but I would be happy to send it to anyone who wanted to see.

Thanks,
Zach

blast • 3.7k views
ADD COMMENT
0
Entering edit mode

you should include a few lines of your tabular file (or the formatting string) - the way you format the blast output affects what gets reported.

ADD REPLY
0
Entering edit mode

Here are the first 36 blast results from the table, not including the actual aligned sequences, which would be too long. The query frame is at the very right of the output.

COLO Scaffold QUERY SEQ ID Percent Identity Alignmnet Length Mismatch No. Gaps Alignmnet Start Alignmnet End Alignment Start in Subject Alignment End Subject E value Bit Score All Seq ID Raw Score No. Identical Matches Positive Scoring Matches Gaps % Positve Scoring Matches Query Frame Subject frame
scaffold_1007362 gi|50254217|gb| 99.93 1513 1 0 447 1959 1536 24 0 2789 gi|50254217|gb|AY567890.1| 1510 1512 1512 0 99.93 1 1
scaffold_3098415 gi|50254217|gb|AY567890.1| 99.93 1513 1 0 1 1513 24 1536 0 2789 gi|50254217|gb|AY567890.1| 1510 1512 1512 0 99.93 1 1
scaffold_2009723 gi|559028|gb|L33375.1|GVSMTDGI 99.67 1200 2 2 1583 2781 1 1199 0 2193 gi|559028|gb|L33375.1|GVSMTDGI 1187 1196 1196 2 99.67 1 1
scaffold_2480986 gi|559028|gb|L33375.1|GVSMTDGI 99.67 1200 2 2 22 1220 1199 1 0 2193 gi|559028|gb|L33375.1|GVSMTDGI 1187 1196 1196 2 99.67 1 1
scaffold_2009723 gi|7339777|gb|AF173577.1|AF173577 99.55 2689 6 6 96 2781 1 2686 0 4894 gi|7339777|gb|AF173577.1|AF173577 2650 2677 2677 6 99.55 1 1
scaffold_2480986 gi|7339777|gb|AF173577.1|AF173577 99.55 2689 6 6 22 2707 2686 1 0 4894 gi|7339777|gb|AF173577.1|AF173577 2650 2677 2677 6 99.55 1 1
scaffold_2592860 gi|449507738|ref|XM_002194478.2| 99.24 1177 9 0 1 1177 1862 3038 0 2124 gi|449507738|ref|XM_002194478.2| 1150 1168 1168 0 99.24 1 1
scaffold_2592860 gi|326923113|ref|XM_003207738.1| 98.56 1177 17 0 1 1177 1694 2870 0 2080 gi|326923113|ref|XM_003207738.1| 1126 1160 1160 0 98.56 1 1
scaffold_2592860 gi|363736155|ref|XM_422151.3| 98.47 1177 18 0 1 1177 1682 2858 0 2074 gi|363736155|ref|XM_422151.3| 1123 1159 1159 0 98.47 1 1
scaffold_2592860 gi|363736157|ref|XM_003641630.1| 98.47 1177 18 0 1 1177 1610 2786 0 2074 gi|363736157|ref|XM_003641630.1| 1123 1159 1159 0 98.47 1 1
scaffold_2196794 gi|363742214|ref|XM_417693.3| 98.02 1214 12 4 1 1206 5238 6447 0 2098 gi|363742214|ref|XM_417693.3| 1136 1190 1190 12 98.02 1 1
scaffold_2009723 gi|108755469|dbj|AP009190.1| 97.79 2761 53 8 45 2802 1 2756 0 4754 gi|108755469|dbj|AP009190.1| 2574 2700 2700 8 97.79 1 1
scaffold_2480986 gi|108755469|dbj|AP009190.1| 97.79 2761 53 8 1 2758 2756 1 0 4754 gi|108755469|dbj|AP009190.1| 2574 2700 2700 8 97.79 1 1
scaffold_336049 gi|193880700|gb|EU738627.1| 97.76 1029 5 1 1 1011 885 1913 0 1760 gi|193880700|gb|EU738627.1| 953 1006 1006 18 97.76 1 1
scaffold_1431697 gi|326917633|ref|XM_003205053.1| 97.34 1014 27 0 1 1014 738 1751 0 1724 gi|326917633|ref|XM_003205053.1| 933 987 987 0 97.34 1 1
scaffold_2196870 gi|449480503|ref|XM_004174759.1| 97.26 1094 23 6 1 1088 140 1232 0 1847 gi|449480503|ref|XM_004174759.1| 1000 1064 1064 7 97.26 1 1
scaffold_1566738 gi|449472058|ref|XM_004176467.1| 97.2 1537 41 2 1 1536 218 1753 0 2599 gi|449472058|ref|XM_004176467.1| 1407 1494 1494 2 97.2 1 1
scaffold_1566738 gi|449472062|ref|XM_002192168.2| 97.2 1537 41 2 1 1536 511 2046 0 2599 gi|449472062|ref|XM_002192168.2| 1407 1494 1494 2 97.2 1 1
scaffold_2006085 gi|449472401|ref|XM_002195258.2| 96.97 1222 37 0 2 1223 2044 823 0 2052 gi|449472401|ref|XM_002195258.2| 1111 1185 1185 0 96.97 1 1
scaffold_2006086 gi|449472401|ref|XM_002195258.2| 96.97 1222 37 0 2 1223 2044 823 0 2052 gi|449472401|ref|XM_002195258.2| 1111 1185 1185 0 96.97 1 1
scaffold_3822665 gi|449472401|ref|XM_002195258.2| 96.97 1222 37 0 1 1222 823 2044 0 2052 gi|449472401|ref|XM_002195258.2| 1111 1185 1185 0 96.97 1 1
scaffold_2847742 gi|108755469|dbj|AP009190.1| 96.93 1304 38 2 1 1303 14272 15574 0 2187 gi|108755469|dbj|AP009190.1| 1184 1264 1264 2 96.93 1 1
scaffold_1007362 gi|108755469|dbj|AP009190.1| 96.79 1960 61 2 1 1959 7387 5429 0 3269 gi|108755469|dbj|AP009190.1| 1770 1897 1897 2 96.79 1 1
scaffold_3098415 gi|108755469|dbj|AP009190.1| 96.79 1960 61 2 1 1959 5429 7387 0 3269 gi|108755469|dbj|AP009190.1| 1770 1897 1897 2 96.79 1 1
scaffold_2196794 gi|449488870|ref|XM_004174384.1| 96.73 1224 18 13 1 1210 5721 6936 0 2019 gi|449488870|ref|XM_004174384.1| 1093 1184 1184 22 96.73 1 1
scaffold_738409 gi|363738418|ref|XM_414230.3| 96.73 1161 36 2 1 1160 9407 8248 0 1932 gi|363738418|ref|XM_414230.3| 1046 1123 1123 2 96.73 1 1
scaffold_3758034 gi|6273352|gb|AF192507.1| 96.71 1093 36 0 396 1488 1 1093 0 1820 gi|6273352|gb|AF192507.1| 985 1057 1057 0 96.71 1 1
scaffold_1833153 gi|449471013|ref|XM_004176891.1| 96.54 1069 35 2 4 1071 3445 4512 0 1768 gi|449471013|ref|XM_004176891.1| 957 1032 1032 2 96.54 1 1
scaffold_3758034 gi|46017945|emb|BX933572.2| 96.5 1313 46 0 176 1488 99 1411 0 2170 gi|46017945|emb|BX933572.2| 1175 1267 1267 0 96.5 1 1
scaffold_1007362 gi|50254215|gb|AY567889.1| 96.43 1514 52 2 447 1959 1536 24 0 2495 gi|50254215|gb|AY567889.1| 1351 1460 1460 2 96.43 1 1
scaffold_3098415 gi|50254215|gb|AY567889.1| 96.43 1514 52 2 1 1513 24 1536 0 2495 gi|50254215|gb|AY567889.1| 1351 1460 1460 2 96.43 1 1
scaffold_3758034 gi|45382444|ref|NM_205370.1| 96.42 1313 47 0 176 1488 100 1412 0 2165 gi|45382444|ref|NM_205370.1|;gi|710333|gb|U20216.1|GGU20216 1172 1266 1266 0 96.42 1 1
scaffold_738409 gi|326927595|ref|XM_003209929.1| 96.3 1161 41 2 1 1160 9077 7918 0 1905 gi|326927595|ref|XM_003209929.1| 1031 1118 1118 2 96.3 1 1
scaffold_1566738 gi|33305418|gb|AF373779.1| 96.29 1536 57 0 1 1536 177 1712 0 2521 gi|33305418|gb|AF373779.1| 1365 1479 1479 0 96.29 1 1
scaffold_1566738 gi|66793442|ref|NM_001024577.1| 96.29 1536 57 0 1 1536 530 2065 0 2521 gi|66793442|ref|NM_001024577.1|;gi|63002670|dbj|AB195262.1| 1365 1479 1479 0 96.29 1 1
scaffold_3334351 gi|449492459|ref|XM_002195206.2| 96.22 1084 34 6 1 1084 2270 3346 0 1768 gi|449492459|ref|XM_002195206.2| 957 1043 1043 7 96.22 1 1

ADD REPLY
0
Entering edit mode
10.0 years ago

Well it is not clear how you got that file as that does not seem to be a standard blast output. Hence there is not much advice we can give you on why the information that you seek is not there. It looks like the product of a custom script.

In general to get the strand information as a column you would need to specify the sstrand field to the tabular output. From your file you may still get that information by looking at the start/end coordinates - when start > end it (probably) means that the alignment is on the reverse strand.

As for frames, these only matter when you are using blastx and tblastx where the alignment uses translated bases. Your example seems to be a nucleotide level alignment where the frames will always be in +1 frame.

ADD COMMENT
0
Entering edit mode

The output was created using Galaxy, as extended 24-column Blast tabular data. Thank you very much for the information about the start and end coordinates. From that I was able to identify all the reverse strands.

I understand that the nucleotide alignment will always be in the plus1 frame, but I was hoping that I could get information directly on the frame of the amino acids, assuming all nucleotide results in my file are protein-coding.

Thanks very much,
Zach Gayk

ADD REPLY

Login before adding your answer.

Traffic: 2804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6