How to extract trinity gene id
1
1
Entering edit mode
8.1 years ago
Bioinfonext ▴ 470

I want to extract trinity gene id only from below text file: like

TRINITY_DN33489_c0_g1_i1 
TRINITY_DN33489_c0_g2_i1
TRINITY_DN33447_c0_g1_i1

# Query: TRINITY_DN33489_c0_g1_i1 len=657 path=[818:0-102 898:103-168 964:169-182 978:183-192 988:193-217 1013:218-224 1968:225-248 1044:249-249 1045:250-265 1061:266-273 1069:274-289 1085:290-329 1965:330-353 1149:354-439 1969:440-463 1259:464-656] [-1, 818, 898, 964, 978, 988, 1013, 1968, 1044, 1045, 1061, 1069, 1085, 1965, 1149, 1969, 1259, -2]

# Query: TRINITY_DN33489_c0_g2_i1 len=816 path=[261:0-148 387:149-278 1963:279-302 541:303-433 672:434-434 1964:435-458 697:459-567 806:568-579 25:580-591 37:592-598 44:599-601 47:602-612 58:613-619 65:620-622 68:623-636 82:637-643 89:644-644 1966:645-668 114:669-749 1967:750-773 219:774-815] [-1, 261, 387, 1963, 541, 672, 1964, 697, 806, 25, 37, 44, 47, 58, 65, 68, 82, 89, 1966, 114, 1967, 219, -2]

# Query: TRINITY_DN33447_c0_g1_i1 len=566 path=[1:0-68 47:69-90 69:91-92 71:93-114 93:115-174 807:175-198 177:199-207 186:208-225 204:226-231 210:232-249 228:250-266 808:267-290 269:291-339 806:340-363 342:364-479 458:480-480 459:481-483 462:484-502 481:503-504 483:505-507 486:508-526 505:527-565] [-1, 1, 47, 69, 71, 93, 807, 177, 186, 204, 210, 228, 808, 269, 806, 342, 458, 459, 462, 481, 483, 486, 505, -2]
RNA-Seq • 1.7k views
ADD COMMENT
0
Entering edit mode

And what have you tried to accomplish that?

ADD REPLY
0
Entering edit mode

I tried cut linux command and also tried with excel but not succeed.

cut -f2 trinity_216_70574__NR__database > trinity_blasted.id

ADD REPLY
0
Entering edit mode

It looks like your data is space-delimited, while cut expects a tab as default delimiter. Try the following:

cut -f2 -d' ' yourfile.txt > output.txt

It's always helpful to check the man page or help of the tool which you try to use

man cut

cut --help
ADD REPLY
0
Entering edit mode

I need to extract all trinity gene id whether it is found hit or not, after using above command it is showing only trinity gene id which found the hit: I want to extract all gene id on the basis of word TRINITY presence, wherever it find TRINITY, should grep that and gives that gene id in output file.

TRINITY_DN33434_c0_g1_i1    gi|353557968|gb|EHC27334.1| 99.310  145 1   0   93  527 1   145 4.72e-96    295 83  527 491

BLASTX
Query:
Database:
0

BLASTX
Query:
Database:
Fields:
1
TRINITY_DN33454_c0_g1_i1    gi|574129240|dbj|GAE73589.1|    98.800  250 3   0   752 3   1   250 1.90e-179   513 73  1034    448
BLASTX

Query:
Database:
Fields:
1
TRINITY_DN33410_c0_g1_i1    gi|313764534|gb|EFS35898.1| 99.379  161 1   0   102 584 1   161 7.92e-108   315 83  585 171

BLASTX
Query:
Database:
0

BLASTX
Query:
Database:
0

BLASTX
Query:
Database:
0
ADD REPLY
3
Entering edit mode
8.1 years ago

That was not clear from your initial question. You also didn't show how the data looks like for ids without hit.

What about:

cat yourfile.txt | tr ' ' '\n' | grep '^TRINITY' > output.txt
ADD COMMENT
0
Entering edit mode

Thanks, it works for me.

ADD REPLY
0
Entering edit mode

Good to hear. I moved my comment to an answer, as such you can accept it to mark this question as solved.

ADD REPLY

Login before adding your answer.

Traffic: 2955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6