Extract Trinity IDs
1
0
Entering edit mode
4.6 years ago
pthom010 ▴ 40

I have a matrix that I was able to assemble in trinity with the FPKM values of a de novo transcriptome. The matrix looks like this:

S1A_rep1    S1A_rep2    S1A_rep3    S1B_rep1    S1B_rep2    S1B_rep3    S1C_rep1    S1C_rep2    S1C_rep3    S1D_rep1    S1D_rep2S1D_rep3    S1E_rep1    S1E_rep2    S1E_rep3    R1A_rep1    R1A_rep2    R1A_rep3    R1B_rep1    R1B_rep2    R1B_rep3    R1C_rep1    R1C_rep3    R1D_rep1    R1D_rep2    R1D_rep3    R1E_rep1    R1E_rep2    R1E_rep3
TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN    1.52    1.20    1.25    0.96    1.91    1.24    1.77    0.00    1.80    1.06    0.00    0.00    0.79    0.00    1.61    2.03    1.51    0.93    1.25    0.00    1.64    2.60    0.00    0.54    0.66    1.90    0.00    2.15    0.00
TRINITY_DN109651_c0_g1_i1   12.38   32.55   62.98   37.92   9.05    40.19   25.49   62.93   10.70   14.69   62.94   24.29   55.76   32.18   9.75    20.53   12.92   26.41   14.40   19.28   0.00    29.97   0.00    17.93   13.23   0.00    13.19   45.60   0.00
TRINITY_DN26469_c0_g1_i1    1.91    0.00    2.62    2.92    2.22    3.88    1.79    0.00    1.46    1.00    0.00    0.00    0.00    0.00    2.54    1.98    1.93    1.27    0.00    0.00    0.00    2.45    0.00    0.38    0.00    1.76    1.66    1.15    0.00
TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1    2.67    2.26    4.12    4.03    3.62    4.21    4.13    4.44    3.59    4.08    4.44    3.00    4.02    3.54    4.44    3.62    3.83    2.87    2.79    2.68    4.23    4.44    3.63    3.77    3.33    3.12    2.74    5.15    3.78
TRINITY_DN3818_c2_g1_i2^HSDD2_ARATH^3Beta_HSD^Tm3   6.63    10.54   6.82    9.27    11.22   6.52    7.87    4.95    5.68    15.04   4.96    3.10    5.36    4.52    6.25    12.76   8.80    12.13   8.57    7.99    9.58    7.74    8.98    9.03    7.85    9.34    9.96    7.25    6.07
TRINITY_DN357_c4_g1_i1  0.00    0.00    0.00    0.00    0.00    6.39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    3.88    21.78   37.00   47.27   3.82    23.00   18.91   7.15    47.84   11.78   12.27   71.75   15.74   17.13   26.68
TRINITY_DN106434_c0_g1_i1^TBL17_ARATH   3.15    9.51    3.77    5.34    9.07    6.28    6.76    2.99    4.36    11.53   2.99    1.24    6.01    4.01    4.07    10.70   11.38   13.91   10.27   7.53    9.84    7.03    9.34    4.86    8.09    8.94    9.19    4.63    4.56
TRINITY_DN17767_c0_g1_i1    1.17    0.46    1.70    1.79    0.96    1.14    0.84    0.59    1.26    0.63    0.59    0.57    1.54    1.27    0.81    3.07    3.05    0.94    3.17    1.82    0.56    4.67    2.64    2.10    2.60    2.31    2.18    4.41    1.98
TRINITY_DN18362_c0_g1_i1    3.14    5.98    8.17    5.84    13.19   8.79    5.65    5.18    6.28    3.09    5.19    2.31    4.28    4.42    3.86    3.04    5.32    5.02    4.11    6.11    8.79    2.85    7.35    4.07    7.41    5.95    2.51    5.34    9.56

Essentially, the column is a series of transcripts that are expressed in some capacity. The second column and those from there on out show the FPKM values for the replicates indicated in the header. What I would like to do is take the Trinity ID (TRINITY_DN12001_c0_g1_i3) for each gene in the first column and insert a second column into the matrix with the Trinity IDs only, leaving the rest of the matrix unchanged. It should look like this:

S1A_rep1    S1A_rep2    S1A_rep3    S1B_rep1    S1B_rep2    S1B_rep3    S1C_rep1    S1C_rep2    S1C_rep3    S1D_rep1    S1D_rep2S1D_rep3    S1E_rep1    S1E_rep2    S1E_rep3    R1A_rep1    R1A_rep2    R1A_rep3    R1B_rep1    R1B_rep2    R1B_rep3    R1C_rep1    R1C_rep3    R1D_rep1    R1D_rep2    R1D_rep3    R1E_rep1    R1E_rep2    R1E_rep3
 TRINITY_DN12001_c0_g1_i3   TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN    1.52    1.20    1.25    0.96    1.91    1.24    1.77    0.00    1.80    1.06    0.00    0.00    0.79    0.00    1.61    2.03    1.51    0.93    1.25    0.00    1.64    2.60    0.00    0.54    0.66    1.90    0.00    2.15    0.00
 TRINITY_DN109651_c0_g1_i1   TRINITY_DN109651_c0_g1_i1  12.38   32.55   62.98   37.92   9.05    40.19   25.49   62.93   10.70   14.69   62.94   24.29   55.76   32.18   9.75    20.53   12.92   26.41   14.40   19.28   0.00    29.97   0.00    17.93   13.23   0.00    13.19   45.60   0.00
 TRINITY_DN26469_c0_g1_i1   TRINITY_DN26469_c0_g1_i1    1.91    0.00    2.62    2.92    2.22    3.88    1.79    0.00    1.46    1.00    0.00    0.00    0.00    0.00    2.54    1.98    1.93    1.27    0.00    0.00    0.00    2.45    0.00    0.38    0.00    1.76    1.66    1.15    0.00
TRINITY_DN16987_c0_g1_i2    TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1    2.67    2.26    4.12    4.03    3.62    4.21    4.13    4.44    3.59    4.08    4.44    3.00    4.02    3.54    4.44    3.62    3.83    2.87    2.79    2.68    4.23    4.44    3.63    3.77    3.33    3.12    2.74    5.15    3.78

I'm not sure if I should use sed or awk in linux. Any help would be appreciated.

RNA-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

a) please format your text, use the "code" option b) provide an example of what do you want, your description is not clear

ADD REPLY
0
Entering edit mode

Made the appropriate edits.

ADD REPLY
2
Entering edit mode
4.6 years ago
JC 13k

This can work:

$ perl -pe 's|(TRINITY_.+_c\d+_g\d+_i\d+)|$1\t$1|' < table
S1A_rep1    S1A_rep2    S1A_rep3    S1B_rep1    S1B_rep2    S1B_rep3    S1C_rep1    S1C_rep2    S1C_rep3    S1D_rep1    S1D_rep2S1D_rep3    S1E_rep1    S1E_rep2    S1E_rep3    R1A_rep1    R1A_rep2    R1A_rep3    R1B_rep1    R1B_rep2    R1B_rep3    R1C_rep1    R1C_rep3    R1D_rep1    R1D_rep2    R1D_rep3    R1E_rep1    R1E_rep2    R1E_rep3
TRINITY_DN12001_c0_g1_i3        TRINITY_DN12001_c0_g1_i3^ARC3_ARATH^MORN    1.52    1.20    1.25    0.96    1.91    1.24    1.77    0.00    1.80    1.06    0.00    0.00    0.79    0.00    1.61    2.03    1.51    0.93    1.25    0.00    1.64    2.60    0.00    0.54    0.66    1.90    0.00    2.15    0.00
TRINITY_DN109651_c0_g1_i1       TRINITY_DN109651_c0_g1_i1   12.38   32.55   62.98   37.92   9.05    40.19   25.49   62.93   10.70   14.69   62.94   24.29   55.76   32.18   9.75    20.53   12.92   26.41   14.40   19.28   0.00    29.97   0.00    17.93   13.23   0.00    13.19   45.60   0.00
TRINITY_DN26469_c0_g1_i1        TRINITY_DN26469_c0_g1_i1    1.91    0.00    2.62    2.92    2.22    3.88    1.79    0.00    1.46    1.00    0.00    0.00    0.00    0.00    2.54    1.98    1.93    1.27    0.00    0.00    0.00    2.45    0.00    0.38    0.00    1.76    1.66    1.15    0.00
TRINITY_DN16987_c0_g1_i2        TRINITY_DN16987_c0_g1_i2^Y005_SYNY3^ABC1    2.67    2.26    4.12    4.03    3.62    4.21    4.13    4.44    3.59    4.08    4.44    3.00    4.02    3.54    4.44    3.62    3.83    2.87    2.79    2.68    4.23    4.44    3.63    3.77    3.33    3.12    2.74    5.15    3.78
TRINITY_DN3818_c2_g1_i2 TRINITY_DN3818_c2_g1_i2^HSDD2_ARATH^3Beta_HSD^Tm3   6.63    10.54   6.82    9.27    11.22   6.52    7.87    4.95    5.68    15.04   4.96    3.10    5.36    4.52    6.25    12.76   8.80    12.13   8.57    7.99    9.58    7.74    8.98    9.03    7.85    9.34    9.96    7.25    6.07
TRINITY_DN357_c4_g1_i1  TRINITY_DN357_c4_g1_i1  0.00    0.00    0.00    0.00    0.00    6.39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    3.88    21.78   37.00   47.27   3.82    23.00   18.91   7.15    47.84   11.78   12.27   71.75   15.74   17.13   26.68
TRINITY_DN106434_c0_g1_i1       TRINITY_DN106434_c0_g1_i1^TBL17_ARATH   3.15    9.51    3.77    5.34    9.07    6.28    6.76    2.99    4.36    11.53   2.99    1.24    6.01    4.01    4.07    10.70   11.38   13.91   10.27   7.53    9.84    7.03    9.34    4.86    8.09    8.94    9.19    4.63    4.56
TRINITY_DN17767_c0_g1_i1        TRINITY_DN17767_c0_g1_i1    1.17    0.46    1.70    1.79    0.96    1.14    0.84    0.59    1.26    0.63    0.59    0.57    1.54    1.27    0.81    3.07    3.05    0.94    3.17    1.82    0.56    4.67    2.64    2.10    2.60    2.31    2.18    4.41    1.98
TRINITY_DN18362_c0_g1_i1        TRINITY_DN18362_c0_g1_i1    3.14    5.98    8.17    5.84    13.19   8.79    5.65    5.18    6.28    3.09    5.19    2.31    4.28    4.42    3.86    3.04    5.32    5.02    4.11    6.11    8.79    2.85    7.35    4.07    7.41    5.95    2.51    5.34    9.56
ADD COMMENT
0
Entering edit mode

I tried this on my mock data and I think the new file works. I will try it on the real thing and let you know. Thanks a bunch!

ADD REPLY
0
Entering edit mode

Works perfectly! Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6