Entering edit mode
8.1 years ago
Bioinfonext
▴
470
I want to extract trinity gene id only from below text file: like
TRINITY_DN33489_c0_g1_i1
TRINITY_DN33489_c0_g2_i1
TRINITY_DN33447_c0_g1_i1
# Query: TRINITY_DN33489_c0_g1_i1 len=657 path=[818:0-102 898:103-168 964:169-182 978:183-192 988:193-217 1013:218-224 1968:225-248 1044:249-249 1045:250-265 1061:266-273 1069:274-289 1085:290-329 1965:330-353 1149:354-439 1969:440-463 1259:464-656] [-1, 818, 898, 964, 978, 988, 1013, 1968, 1044, 1045, 1061, 1069, 1085, 1965, 1149, 1969, 1259, -2]
# Query: TRINITY_DN33489_c0_g2_i1 len=816 path=[261:0-148 387:149-278 1963:279-302 541:303-433 672:434-434 1964:435-458 697:459-567 806:568-579 25:580-591 37:592-598 44:599-601 47:602-612 58:613-619 65:620-622 68:623-636 82:637-643 89:644-644 1966:645-668 114:669-749 1967:750-773 219:774-815] [-1, 261, 387, 1963, 541, 672, 1964, 697, 806, 25, 37, 44, 47, 58, 65, 68, 82, 89, 1966, 114, 1967, 219, -2]
# Query: TRINITY_DN33447_c0_g1_i1 len=566 path=[1:0-68 47:69-90 69:91-92 71:93-114 93:115-174 807:175-198 177:199-207 186:208-225 204:226-231 210:232-249 228:250-266 808:267-290 269:291-339 806:340-363 342:364-479 458:480-480 459:481-483 462:484-502 481:503-504 483:505-507 486:508-526 505:527-565] [-1, 1, 47, 69, 71, 93, 807, 177, 186, 204, 210, 228, 808, 269, 806, 342, 458, 459, 462, 481, 483, 486, 505, -2]
And what have you tried to accomplish that?
I tried cut linux command and also tried with excel but not succeed.
cut -f2 trinity_216_70574__NR__database > trinity_blasted.id
It looks like your data is space-delimited, while cut expects a tab as default delimiter. Try the following:
It's always helpful to check the man page or help of the tool which you try to use
I need to extract all trinity gene id whether it is found hit or not, after using above command it is showing only trinity gene id which found the hit: I want to extract all gene id on the basis of word TRINITY presence, wherever it find TRINITY, should grep that and gives that gene id in output file.