Question

Extract Trinity IDs

0

Entering edit mode

4.6 years ago

pthom010 ▴ 40

I have a matrix that I was able to assemble in trinity with the FPKM values of a de novo transcriptome. The matrix looks like this:

S1A_rep1    S1A_rep2    S1A_rep3    S1B_rep1    S1B_rep2    S1B_rep3    S1C_rep1    S1C_rep2    S1C_rep3    S1D_rep1    S1D_rep2S1D_rep3    S1E_rep1    S1E_rep2    S1E_rep3    R1A_rep1    R1A_rep2    R1A_rep3    R1B_rep1    R1B_rep2    R1B_rep3    R1C_rep1    R1C_rep3    R1D_rep1    R1D_rep2    R1D_rep3    R1E_rep1    R1E_rep2    R1E_rep3
TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN    1.52    1.20    1.25    0.96    1.91    1.24    1.77    0.00    1.80    1.06    0.00    0.00    0.79    0.00    1.61    2.03    1.51    0.93    1.25    0.00    1.64    2.60    0.00    0.54    0.66    1.90    0.00    2.15    0.00
TRINITY_DN109651_c0_g1_i1   12.38   32.55   62.98   37.92   9.05    40.19   25.49   62.93   10.70   14.69   62.94   24.29   55.76   32.18   9.75    20.53   12.92   26.41   14.40   19.28   0.00    29.97   0.00    17.93   13.23   0.00    13.19   45.60   0.00
TRINITY_DN26469_c0_g1_i1    1.91    0.00    2.62    2.92    2.22    3.88    1.79    0.00    1.46    1.00    0.00    0.00    0.00    0.00    2.54    1.98    1.93    1.27    0.00    0.00    0.00    2.45    0.00    0.38    0.00    1.76    1.66    1.15    0.00
TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1    2.67    2.26    4.12    4.03    3.62    4.21    4.13    4.44    3.59    4.08    4.44    3.00    4.02    3.54    4.44    3.62    3.83    2.87    2.79    2.68    4.23    4.44    3.63    3.77    3.33    3.12    2.74    5.15    3.78
TRINITY_DN3818_c2_g1_i2^HSDD2_ARATH^3Beta_HSD^Tm3   6.63    10.54   6.82    9.27    11.22   6.52    7.87    4.95    5.68    15.04   4.96    3.10    5.36    4.52    6.25    12.76   8.80    12.13   8.57    7.99    9.58    7.74    8.98    9.03    7.85    9.34    9.96    7.25    6.07
TRINITY_DN357_c4_g1_i1  0.00    0.00    0.00    0.00    0.00    6.39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    3.88    21.78   37.00   47.27   3.82    23.00   18.91   7.15    47.84   11.78   12.27   71.75   15.74   17.13   26.68
TRINITY_DN106434_c0_g1_i1^TBL17_ARATH   3.15    9.51    3.77    5.34    9.07    6.28    6.76    2.99    4.36    11.53   2.99    1.24    6.01    4.01    4.07    10.70   11.38   13.91   10.27   7.53    9.84    7.03    9.34    4.86    8.09    8.94    9.19    4.63    4.56
TRINITY_DN17767_c0_g1_i1    1.17    0.46    1.70    1.79    0.96    1.14    0.84    0.59    1.26    0.63    0.59    0.57    1.54    1.27    0.81    3.07    3.05    0.94    3.17    1.82    0.56    4.67    2.64    2.10    2.60    2.31    2.18    4.41    1.98
TRINITY_DN18362_c0_g1_i1    3.14    5.98    8.17    5.84    13.19   8.79    5.65    5.18    6.28    3.09    5.19    2.31    4.28    4.42    3.86    3.04    5.32    5.02    4.11    6.11    8.79    2.85    7.35    4.07    7.41    5.95    2.51    5.34    9.56

Essentially, the column is a series of transcripts that are expressed in some capacity. The second column and those from there on out show the FPKM values for the replicates indicated in the header. What I would like to do is take the Trinity ID (TRINITY_DN12001_c0_g1_i3) for each gene in the first column and insert a second column into the matrix with the Trinity IDs only, leaving the rest of the matrix unchanged. It should look like this:

S1A_rep1    S1A_rep2    S1A_rep3    S1B_rep1    S1B_rep2    S1B_rep3    S1C_rep1    S1C_rep2    S1C_rep3    S1D_rep1    S1D_rep2S1D_rep3    S1E_rep1    S1E_rep2    S1E_rep3    R1A_rep1    R1A_rep2    R1A_rep3    R1B_rep1    R1B_rep2    R1B_rep3    R1C_rep1    R1C_rep3    R1D_rep1    R1D_rep2    R1D_rep3    R1E_rep1    R1E_rep2    R1E_rep3
 TRINITY_DN12001_c0_g1_i3   TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN    1.52    1.20    1.25    0.96    1.91    1.24    1.77    0.00    1.80    1.06    0.00    0.00    0.79    0.00    1.61    2.03    1.51    0.93    1.25    0.00    1.64    2.60    0.00    0.54    0.66    1.90    0.00    2.15    0.00
 TRINITY_DN109651_c0_g1_i1   TRINITY_DN109651_c0_g1_i1  12.38   32.55   62.98   37.92   9.05    40.19   25.49   62.93   10.70   14.69   62.94   24.29   55.76   32.18   9.75    20.53   12.92   26.41   14.40   19.28   0.00    29.97   0.00    17.93   13.23   0.00    13.19   45.60   0.00
 TRINITY_DN26469_c0_g1_i1   TRINITY_DN26469_c0_g1_i1    1.91    0.00    2.62    2.92    2.22    3.88    1.79    0.00    1.46    1.00    0.00    0.00    0.00    0.00    2.54    1.98    1.93    1.27    0.00    0.00    0.00    2.45    0.00    0.38    0.00    1.76    1.66    1.15    0.00
TRINITY_DN16987_c0_g1_i2    TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1    2.67    2.26    4.12    4.03    3.62    4.21    4.13    4.44    3.59    4.08    4.44    3.00    4.02    3.54    4.44    3.62    3.83    2.87    2.79    2.68    4.23    4.44    3.63    3.77    3.33    3.12    2.74    5.15    3.78

I'm not sure if I should use sed or awk in linux. Any help would be appreciated.

RNA-Seq • 1.2k views

ADD COMMENT • link updated 4.6 years ago by JC 13k • written 4.6 years ago by pthom010 ▴ 40

0

Entering edit mode

a) please format your text, use the "code" option b) provide an example of what do you want, your description is not clear

ADD REPLY • link 4.6 years ago by JC 13k

0

Entering edit mode

Made the appropriate edits.

ADD REPLY • link 4.6 years ago by pthom010 ▴ 40

score 2 · Accepted Answer · 2020-05-02

This can work:

$ perl -pe 's|(TRINITY_.+_c\d+_g\d+_i\d+)|$1\t$1|' < table
S1A_rep1    S1A_rep2    S1A_rep3    S1B_rep1    S1B_rep2    S1B_rep3    S1C_rep1    S1C_rep2    S1C_rep3    S1D_rep1    S1D_rep2S1D_rep3    S1E_rep1    S1E_rep2    S1E_rep3    R1A_rep1    R1A_rep2    R1A_rep3    R1B_rep1    R1B_rep2    R1B_rep3    R1C_rep1    R1C_rep3    R1D_rep1    R1D_rep2    R1D_rep3    R1E_rep1    R1E_rep2    R1E_rep3
TRINITY_DN12001_c0_g1_i3        TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN    1.52    1.20    1.25    0.96    1.91    1.24    1.77    0.00    1.80    1.06    0.00    0.00    0.79    0.00    1.61    2.03    1.51    0.93    1.25    0.00    1.64    2.60    0.00    0.54    0.66    1.90    0.00    2.15    0.00
TRINITY_DN109651_c0_g1_i1       TRINITY_DN109651_c0_g1_i1   12.38   32.55   62.98   37.92   9.05    40.19   25.49   62.93   10.70   14.69   62.94   24.29   55.76   32.18   9.75    20.53   12.92   26.41   14.40   19.28   0.00    29.97   0.00    17.93   13.23   0.00    13.19   45.60   0.00
TRINITY_DN26469_c0_g1_i1        TRINITY_DN26469_c0_g1_i1    1.91    0.00    2.62    2.92    2.22    3.88    1.79    0.00    1.46    1.00    0.00    0.00    0.00    0.00    2.54    1.98    1.93    1.27    0.00    0.00    0.00    2.45    0.00    0.38    0.00    1.76    1.66    1.15    0.00
TRINITY_DN16987_c0_g1_i2        TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1    2.67    2.26    4.12    4.03    3.62    4.21    4.13    4.44    3.59    4.08    4.44    3.00    4.02    3.54    4.44    3.62    3.83    2.87    2.79    2.68    4.23    4.44    3.63    3.77    3.33    3.12    2.74    5.15    3.78
TRINITY_DN3818_c2_g1_i2 TRINITY_DN3818_c2_g1_i2^HSDD2_ARATH^3Beta_HSD^Tm3   6.63    10.54   6.82    9.27    11.22   6.52    7.87    4.95    5.68    15.04   4.96    3.10    5.36    4.52    6.25    12.76   8.80    12.13   8.57    7.99    9.58    7.74    8.98    9.03    7.85    9.34    9.96    7.25    6.07
TRINITY_DN357_c4_g1_i1  TRINITY_DN357_c4_g1_i1  0.00    0.00    0.00    0.00    0.00    6.39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    3.88    21.78   37.00   47.27   3.82    23.00   18.91   7.15    47.84   11.78   12.27   71.75   15.74   17.13   26.68
TRINITY_DN106434_c0_g1_i1       TRINITY_DN106434_c0_g1_i1^TBL17_ARATH   3.15    9.51    3.77    5.34    9.07    6.28    6.76    2.99    4.36    11.53   2.99    1.24    6.01    4.01    4.07    10.70   11.38   13.91   10.27   7.53    9.84    7.03    9.34    4.86    8.09    8.94    9.19    4.63    4.56
TRINITY_DN17767_c0_g1_i1        TRINITY_DN17767_c0_g1_i1    1.17    0.46    1.70    1.79    0.96    1.14    0.84    0.59    1.26    0.63    0.59    0.57    1.54    1.27    0.81    3.07    3.05    0.94    3.17    1.82    0.56    4.67    2.64    2.10    2.60    2.31    2.18    4.41    1.98
TRINITY_DN18362_c0_g1_i1        TRINITY_DN18362_c0_g1_i1    3.14    5.98    8.17    5.84    13.19   8.79    5.65    5.18    6.28    3.09    5.19    2.31    4.28    4.42    3.86    3.04    5.32    5.02    4.11    6.11    8.79    2.85    7.35    4.07    7.41    5.95    2.51    5.34    9.56