I have the following dataset in which I have 2 replicates per group (e.g. group 1: E_13_5_midline; replicates group 1: E_13_5_midline_1; E_13_5_midline_2).
ensembl_gene_id E13_5_meninges_1 E13_5_meninges_2 E13_5_midline_1 E13_5_midline_2 E14_5_meninges_1 E14_5_midline_1 E14_5_midline_2 E15_5_meninges_1 E15_5_meninges_2 E15_5_midline_1
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSMUSG0000000… 6342. 6238. 7440. 6905. 6076. 7237. 7085. 5846. 5789. 6509.
2 ENSMUSG0000000… 771. 768. 406. 450. 665. 450. 418. 607. 602. 443.
3 ENSMUSG0000000… 40981. 43835. 96853. 89887. 39372. 150312. 157692. 64253. 53234. 259484.
4 ENSMUSG0000000… 311. 265. 389. 367. 279. 585. 536. 277. 278. 408.
5 ENSMUSG0000000… 1364. 1378. 2128. 1648. 1199. 1652. 1793. 1332. 1140. 1688.
6 ENSMUSG0000000… 1035. 1106. 321. 617. 1125. 428. 426. 1310. 1635. 553.
7 ENSMUSG0000000… 4285. 3985. 5693. 5084. 3205. 4024. 3700. 3500. 3556. 3806.
8 ENSMUSG0000000… 870. 866. 798. 864. 779. 815. 767. 911. 846. 876.
9 ENSMUSG0000000… 918. 994. 660. 693. 921. 444. 614. 784. 745. 693.
10 ENSMUSG0000000… 1266. 1304. 176. 618. 1279. 159. 162. 1311. 1402. 269.`
And I'm am interested in plotting the mean of each replicate for some genes of interest. So what I have done was to rearrange the dataset using the function 'gather', where I created a new column that contains all the replicates and another one with the corresponding value of the normalised counts.
ensembl_gene_id external_gene_name description chromosome_name start_position end_position condition counts
<chr> <chr> <chr> <chr> <int> <int> <chr> <dbl>
1 ENSMUSG00000000001 Gnai3 guanine nucleotide binding protein (G protein), alpha inhibiting 3 [Source:MGI Sym… 3 108107280 108146146 E13_5_meninge… 6342.
2 ENSMUSG00000000028 Cdc45 cell division cycle 45 [Source:MGI Symbol;Acc:MGI:1338073] 16 18780447 18811987 E13_5_meninge… 771.
3 ENSMUSG00000000031 H19 H19, imprinted maternally expressed transcript [Source:MGI Symbol;Acc:MGI:95891] 7 142575529 142578143 E13_5_meninge… 40981.
4 ENSMUSG00000000037 Scml2 sex comb on midleg-like 2 (Drosophila) [Source:MGI Symbol;Acc:MGI:1340042] X 161117193 161258213 E13_5_meninge… 311.
5 ENSMUSG00000000056 Narf nuclear prelamin A recognition factor [Source:MGI Symbol;Acc:MGI:1914858] 11 121237253 121255856 E13_5_meninge… 1364.
6 ENSMUSG00000000058 Cav2 caveolin 2 [Source:MGI Symbol;Acc:MGI:107571] 6 17281185 17289115 E13_5_meninge… 1035.
7 ENSMUSG00000000078 Klf6 Kruppel-like factor 6 [Source:MGI Symbol;Acc:MGI:1346318] 13 5861482 5870394 E13_5_meninge… 4285.
8 ENSMUSG00000000085 Scmh1 sex comb on midleg homolog 1 [Source:MGI Symbol;Acc:MGI:1352762] 4 120405281 120530186 E13_5_meninge… 870.
9 ENSMUSG00000000088 Cox5a cytochrome c oxidase subunit Va [Source:MGI Symbol;Acc:MGI:88474] 9 57521274 57532426 E13_5_meninge… 918.
10 ENSMUSG00000000093 Tbx2 T-box 2 [Source:MGI Symbol;Acc:MGI:98494] 11 85832551 85841948 E13_5_meninge… 1266.
The problem is, when I plot the replicates they appear as individual samples and not as replicates. Anyone can help on how I should grou the data to achieve what I want?
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you. I have used the formatting bar but for some reason it didn't work. I'm sorry for the trouble.
It's OK - to format a block of text, select all lines before using the formatting bar. If you hit it without selecting text, it does inline code formatting, which doesn't translate well to how you wish the data to be.
Thanks a lot for clarifying! :)
I'm not 100% clear on what you're looking for, what you've tried, and what packages you're using. If I assume that you want to create a bar plot using ggplot2, here's what I would do:
Separate your "condition" column into two columns (using separate from the tidyr package) into something like c("condition", "replicate"). Something like this will should then work:
I probably miss expressed myself. What I want is to create a new column that would contain each group on my analysis and a second column that contains the replicate inside each group.
Oh - sorry, I misunderstood. Check out the
separate
command in tidyr, it's specifically designed to do what you're looking for (split one column into 2). You want to separate the data based on the third underscore in your condition name, correct? My regex isn't good enough to identify the third occurrence of a character in a string; my hack-y workaround is to simply replace the first two underscores with a different separator using thesub
function.e.g.
Can you share the code (including the
gather
command) and plot that you generated and that you want to optimize?The graph that I want to optimize is bellow https://ibb.co/hTTkvf
And the type of graph I would like to have is something like this one https://ibb.co/fCrVvf