I have a matrix that I was able to assemble in trinity with the FPKM values of a de novo transcriptome. The matrix looks like this:
S1A_rep1 S1A_rep2 S1A_rep3 S1B_rep1 S1B_rep2 S1B_rep3 S1C_rep1 S1C_rep2 S1C_rep3 S1D_rep1 S1D_rep2S1D_rep3 S1E_rep1 S1E_rep2 S1E_rep3 R1A_rep1 R1A_rep2 R1A_rep3 R1B_rep1 R1B_rep2 R1B_rep3 R1C_rep1 R1C_rep3 R1D_rep1 R1D_rep2 R1D_rep3 R1E_rep1 R1E_rep2 R1E_rep3
TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN 1.52 1.20 1.25 0.96 1.91 1.24 1.77 0.00 1.80 1.06 0.00 0.00 0.79 0.00 1.61 2.03 1.51 0.93 1.25 0.00 1.64 2.60 0.00 0.54 0.66 1.90 0.00 2.15 0.00
TRINITY_DN109651_c0_g1_i1 12.38 32.55 62.98 37.92 9.05 40.19 25.49 62.93 10.70 14.69 62.94 24.29 55.76 32.18 9.75 20.53 12.92 26.41 14.40 19.28 0.00 29.97 0.00 17.93 13.23 0.00 13.19 45.60 0.00
TRINITY_DN26469_c0_g1_i1 1.91 0.00 2.62 2.92 2.22 3.88 1.79 0.00 1.46 1.00 0.00 0.00 0.00 0.00 2.54 1.98 1.93 1.27 0.00 0.00 0.00 2.45 0.00 0.38 0.00 1.76 1.66 1.15 0.00
TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1 2.67 2.26 4.12 4.03 3.62 4.21 4.13 4.44 3.59 4.08 4.44 3.00 4.02 3.54 4.44 3.62 3.83 2.87 2.79 2.68 4.23 4.44 3.63 3.77 3.33 3.12 2.74 5.15 3.78
TRINITY_DN3818_c2_g1_i2^HSDD2_ARATH^3Beta_HSD^Tm3 6.63 10.54 6.82 9.27 11.22 6.52 7.87 4.95 5.68 15.04 4.96 3.10 5.36 4.52 6.25 12.76 8.80 12.13 8.57 7.99 9.58 7.74 8.98 9.03 7.85 9.34 9.96 7.25 6.07
TRINITY_DN357_c4_g1_i1 0.00 0.00 0.00 0.00 0.00 6.39 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.88 21.78 37.00 47.27 3.82 23.00 18.91 7.15 47.84 11.78 12.27 71.75 15.74 17.13 26.68
TRINITY_DN106434_c0_g1_i1^TBL17_ARATH 3.15 9.51 3.77 5.34 9.07 6.28 6.76 2.99 4.36 11.53 2.99 1.24 6.01 4.01 4.07 10.70 11.38 13.91 10.27 7.53 9.84 7.03 9.34 4.86 8.09 8.94 9.19 4.63 4.56
TRINITY_DN17767_c0_g1_i1 1.17 0.46 1.70 1.79 0.96 1.14 0.84 0.59 1.26 0.63 0.59 0.57 1.54 1.27 0.81 3.07 3.05 0.94 3.17 1.82 0.56 4.67 2.64 2.10 2.60 2.31 2.18 4.41 1.98
TRINITY_DN18362_c0_g1_i1 3.14 5.98 8.17 5.84 13.19 8.79 5.65 5.18 6.28 3.09 5.19 2.31 4.28 4.42 3.86 3.04 5.32 5.02 4.11 6.11 8.79 2.85 7.35 4.07 7.41 5.95 2.51 5.34 9.56
Essentially, the column is a series of transcripts that are expressed in some capacity. The second column and those from there on out show the FPKM values for the replicates indicated in the header. What I would like to do is take the Trinity ID (TRINITY_DN12001_c0_g1_i3) for each gene in the first column and insert a second column into the matrix with the Trinity IDs only, leaving the rest of the matrix unchanged. It should look like this:
S1A_rep1 S1A_rep2 S1A_rep3 S1B_rep1 S1B_rep2 S1B_rep3 S1C_rep1 S1C_rep2 S1C_rep3 S1D_rep1 S1D_rep2S1D_rep3 S1E_rep1 S1E_rep2 S1E_rep3 R1A_rep1 R1A_rep2 R1A_rep3 R1B_rep1 R1B_rep2 R1B_rep3 R1C_rep1 R1C_rep3 R1D_rep1 R1D_rep2 R1D_rep3 R1E_rep1 R1E_rep2 R1E_rep3
TRINITY_DN12001_c0_g1_i3 TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN 1.52 1.20 1.25 0.96 1.91 1.24 1.77 0.00 1.80 1.06 0.00 0.00 0.79 0.00 1.61 2.03 1.51 0.93 1.25 0.00 1.64 2.60 0.00 0.54 0.66 1.90 0.00 2.15 0.00
TRINITY_DN109651_c0_g1_i1 TRINITY_DN109651_c0_g1_i1 12.38 32.55 62.98 37.92 9.05 40.19 25.49 62.93 10.70 14.69 62.94 24.29 55.76 32.18 9.75 20.53 12.92 26.41 14.40 19.28 0.00 29.97 0.00 17.93 13.23 0.00 13.19 45.60 0.00
TRINITY_DN26469_c0_g1_i1 TRINITY_DN26469_c0_g1_i1 1.91 0.00 2.62 2.92 2.22 3.88 1.79 0.00 1.46 1.00 0.00 0.00 0.00 0.00 2.54 1.98 1.93 1.27 0.00 0.00 0.00 2.45 0.00 0.38 0.00 1.76 1.66 1.15 0.00
TRINITY_DN16987_c0_g1_i2 TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1 2.67 2.26 4.12 4.03 3.62 4.21 4.13 4.44 3.59 4.08 4.44 3.00 4.02 3.54 4.44 3.62 3.83 2.87 2.79 2.68 4.23 4.44 3.63 3.77 3.33 3.12 2.74 5.15 3.78
I'm not sure if I should use sed or awk in linux. Any help would be appreciated.
a) please format your text, use the "code" option b) provide an example of what do you want, your description is not clear
Made the appropriate edits.