Convert with respect to first column (repeated values) into rows (transposon)
2
0
Entering edit mode
16 months ago
waqasnayab ▴ 250

Hello,

I have a file like (just a snippet of 2 sample IDs):

Sample Name Target Name CT
17BE0195802 LOC100996437-Hs04403548_m1  Undetermined
17BE0195802 LINC01372-Hs01911576_s1 Undetermined
17BE0195802 CD27-AS1-Hs05048951_s1  33.012
17BE0195802 COMMD10-Hs01024000_m1   33.817
17BE0195802 ABAT-Hs00609436_m1  34.328
17BE0195802 PPARGC1A-Hs00173304_m1  Undetermined
17BE0195802 VN1R2-Hs00545195_s1 Undetermined
17BE0195802 NR_051985.1-APXGXPZ 29.600
17BE0195802 TNFSF10-Hs00921974_m1   33.189
17BE0195802 MEST-Hs00853380_g1  31.964
17BE0195802 SNORA81-APWC343 31.774
17BE0195802 SNORA64-Mm04336067_s1   Undetermined
17BE0195802 RNA45S5-Hs05057753_g1   31.831
17BE0195802 LINC00905-Hs01126887_g1 Undetermined
17BE0195802 FAM72C-Hs00822131_m1    Undetermined
17BE0195802 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0195802 ADAM32-Hs00991935_m1    35.367
17BE0195802 LOC100996437-Hs04403548_m1  Undetermined
17BE0195802 LINC01372-Hs01911576_s1 Undetermined
17BE0195802 CD27-AS1-Hs05048951_s1  32.527
17BE0195802 COMMD10-Hs01024000_m1   Undetermined
17BE0195802 ABAT-Hs00609436_m1  33.963
17BE0195802 PPARGC1A-Hs00173304_m1  Undetermined
17BE0195802 VN1R2-Hs00545195_s1 Undetermined
17BE0195802 NR_051985.1-APXGXPZ 28.917
17BE0195802 TNFSF10-Hs00921974_m1   32.797
17BE0195802 MEST-Hs00853380_g1  31.926
17BE0195802 SNORA81-APWC343 32.128
17BE0195802 SNORA64-Mm04336067_s1   Undetermined
17BE0195802 RNA45S5-Hs05057753_g1   30.637
17BE0195802 LINC00905-Hs01126887_g1 Undetermined
17BE0195802 FAM72C-Hs00822131_m1    34.719
17BE0195802 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0195802 ADAM32-Hs00991935_m1    Undetermined
17BE0196102 LOC100996437-Hs04403548_m1  32.596
17BE0196102 LINC01372-Hs01911576_s1 Undetermined
17BE0196102 CD27-AS1-Hs05048951_s1  35.263
17BE0196102 COMMD10-Hs01024000_m1   Undetermined
17BE0196102 ABAT-Hs00609436_m1  Undetermined
17BE0196102 PPARGC1A-Hs00173304_m1  Undetermined
17BE0196102 VN1R2-Hs00545195_s1 Undetermined
17BE0196102 NR_051985.1-APXGXPZ 30.639
17BE0196102 TNFSF10-Hs00921974_m1   33.113
17BE0196102 MEST-Hs00853380_g1  33.556
17BE0196102 SNORA81-APWC343 33.803
17BE0196102 SNORA64-Mm04336067_s1   Undetermined
17BE0196102 RNA45S5-Hs05057753_g1   33.795
17BE0196102 LINC00905-Hs01126887_g1 Undetermined
17BE0196102 FAM72C-Hs00822131_m1    Undetermined
17BE0196102 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0196102 ADAM32-Hs00991935_m1    Undetermined
17BE0196102 LOC100996437-Hs04403548_m1  Undetermined
17BE0196102 LINC01372-Hs01911576_s1 Undetermined
17BE0196102 CD27-AS1-Hs05048951_s1  34.599
17BE0196102 COMMD10-Hs01024000_m1   Undetermined
17BE0196102 ABAT-Hs00609436_m1  Undetermined
17BE0196102 PPARGC1A-Hs00173304_m1  Undetermined
17BE0196102 VN1R2-Hs00545195_s1 Undetermined
17BE0196102 NR_051985.1-APXGXPZ 30.521
17BE0196102 TNFSF10-Hs00921974_m1   32.791
17BE0196102 MEST-Hs00853380_g1  32.778
17BE0196102 SNORA81-APWC343 35.372
17BE0196102 SNORA64-Mm04336067_s1   Undetermined
17BE0196102 RNA45S5-Hs05057753_g1   33.783
17BE0196102 LINC00905-Hs01126887_g1 Undetermined
17BE0196102 FAM72C-Hs00822131_m1    Undetermined
17BE0196102 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0196102 ADAM32-Hs00991935_m1    Undetermined

As you can 17BE0195802 is repeated 34 times, since the experiment is repeated on 17 biomarkers (starting from LOC100996437-Hs04403548_m1 to ADAM32-Hs00991935_m1, then the same order is repeated again another time for the same sample).

My desired output would be:

               LOC100996437-Hs04403548_m1   LINC01372-Hs01911576_s1 CD27-AS1-Hs05048951_s1  COMMD10-Hs01024000_m1   ABAT-Hs00609436_m1  PPARGC1A-Hs00173304_m1  VN1R2-Hs00545195_s1 NR_051985.1-APXGXPZ TNFSF10-Hs00921974_m1   MEST-Hs00853380_g1  SNORA81-APWC343 SNORA64-Mm04336067_s1   RNA45S5-Hs05057753_g1   LINC00905-Hs01126887_g1 FAM72C-Hs00822131_m1    DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    ADAM32-Hs00991935_m1
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .                                                       '

using the thread: https://www.unix.com/shell-programming-and-scripting/158305-convert-columns-into-rows-respect-first-column.html I came up with the following output:

nawk '{a[$1]=($1 in a)?a[$1] OFS sprintf("%-*s",w,$3):sprintf("%-*s",w,$3);h[$2]}END{for(i in h) printf("\t%-*s", w,i);print "";for(i in a) print i,a[i]}' OFS='\t' waqas_trial.txt

Picture is attached.

As I have repeated experimental conditions, I am not able to write two lines for each sample. Your kind help is highly appreciated.

Thanks,

Waqas.

qPCR transposons • 837 views
ADD COMMENT
0
Entering edit mode
16 months ago
biofalconch ★ 1.3k

You could use R and tidyr, something like this:

library(tidyr)
File <- read.table("File.txt",header=T,sep="\t")
CoolMatrix <- pivot_wider(File,  names_from = "Sample Name", values_from = "Target Name")

Just a note: If you are using single spaces as separators, you might want to change your column names as they also include them :)

ADD COMMENT
0
Entering edit mode

Thanks biofalconch

But this is the best which I got:

> CoolMatrix <- pivot_wider(File,  names_from = "Target.Name", values_from = "CT")

Attaching the output

Looking for your cooperation.

Waqas.

ADD REPLY
0
Entering edit mode

This won't get OP's desired output since they want to have two rows per sample name, I'm assuming for each replicate. I would probably add _1 or _2 to the sample name using some wizardry with duplicated() which would then make your above code work.

ADD REPLY
0
Entering edit mode
16 months ago
waqasnayab ▴ 250

Thanks Travis.

Even when I use _1 or _2, the output is:

enter image description here

Is there any way to ask whenever there is 18th line of the same sample ID occurrence, consider it as a second single line of the same sample and write it below the same sample?:

            LOC100996437-Hs04403548_m1   LINC01372-Hs01911576_s1 CD27-AS1-Hs05048951_s1  COMMD10-Hs01024000_m1   ABAT-Hs00609436_m1  PPARGC1A-Hs00173304_m1  VN1R2-Hs00545195_s1 NR_051985.1-APXGXPZ TNFSF10-Hs00921974_m1   MEST-Hs00853380_g1  SNORA81-APWC343 SNORA64-Mm04336067_s1   RNA45S5-Hs05057753_g1   LINC00905-Hs01126887_g1 FAM72C-Hs00822131_m1    DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    ADAM32-Hs00991935_m1
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .                                                       '

Waqas.

ADD COMMENT
0
Entering edit mode

This isn't what I recommended, you added _n but not _1 or _2. A very quick workaround would be to use seq and rep to add a new column denoting the replicate number for every 18 rows, then use paste (and if necessary, case_when) to add the replicate number to your sample name.

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6