hello! I'm very i difficult with normalization of my data. I was searching for transposable elements in my genome, and after this step, I made counts of reads in some transcripts. I produced something like that:
head(table_tissues_filtered_TE)
Lengths ova testes lobe retina suckers brain1 brain2 skin stage
Simple_repeat_80 134 58 77 48 69 115 137 131 195 75
tRNA_1 59 0 14 12 1 19 12 14 21 104
Simple_repeat_87 26 1 33 12 3 15 24 21 19 180
Simple_repeat_114 22 0 0 0 1 0 0 0 2 7
Simple_repeat_115 30 0 0 0 0 0 0 0 0 1
Simple_repeat_123 22 2 3 317 45 13 652 651 15 21
axial gland viscera
Simple_repeat_80 99 35 557
tRNA_1 9 0 3
Simple_repeat_87 9 0 4
Simple_repeat_114 0 204 0
Simple_repeat_115 0 42 0
Simple_repeat_123 333 5 4
where Lengths are the Length of each elements (simple repeats, etc), and the other columns indicate the reads counted with Featurecounts. I've another thable with the number of reads for each tissues:
head(reads_table)
ova testes lobe retina suckers brain1 brain2 skin stage axial
522444 310243 226146 102307 126055 489389 668243 372728 262536 233754
gland viscera
24817 25689
I would make a RPKM analysis to normalize the data using R, but I don't know exactly of to do it. Anyone can help me? thank you!!!
RPKM/FPKM is a unit, not a method or analysis. Today, people usually use TPM unit instead of R(F)PKM. To calculate TPM, you can run your bam files through Stringtie software, or you can use Salmon or Kallisto software using the fastq files directly.
Stringtie is not implemented in R, right?
No, its a standalone software. Here is the link https://ccb.jhu.edu/software/stringtie/