Hi,
I have bulkRNAseq dates (12 samples, pair end sequenced) and my pipine was :
- I performed quality control with FastqQC,
- Trimmed reads with Trimmomatic
- Aligned reads to the reference genome with STAR
- Used Samtools to sort and index the BAM files
- Calculated counts with FeatureCounts
Now, I want to cover the counts into TPM as follow:
counts_to_tpm <- function(counts, featureLength, meanFragmentLength)
where
counts
that is my merge file with the hit counts from all samplesfeatureLength
A numeric vector with feature lengths which it's present in my BAM filemeanFragmentLength
is the mean fragment lengths
Is it correct to calculate this parameter can be calculated with CollectRnaSeqMetrics (Picard)
or with picardmetrics
? and do I have to run it for every samples of my dataset? or given they have been sequenced together one sample is enough? I guess the mean length for x gene should be the same regardless the samples - or am I wrong and I didn't`t get the role of Picard?
I did try to run it for one sample but I am confused which is the parameter that I have to use in code above for the meanFragmentLength
to get the TPM. I got a txt file which looks like this:
Apologies if it is again a stupid question!
Thank you for the help!
Camilla
Why not use something like RSEM to get counts and TPM?
but I will have to align again the sample right? I struggled (a lot) to find a computer with enough RAM to run
STAR
and I I don't want to go back to the original fastq if possible.Gotcha. Next time though, use RSEM as it internally uses STAR (as one option, it can use other aligners as well) anyway. Also, you're going the alignment route with STAR but if RAM is your primary concern, you should go the pseudoalignment route.
I'm assuming you already looked at this answer: Calculating TPM from featureCounts output - that states that you can use the picardmetrics wrapper. I'm trying to figure out where you can get the mean fragment length.
I have calculate from this tutorial. so basically it retrieve the normalized counts matrix using
Deseq2
. Do you think it is correct?What is your end goal? TPM is only valid for comparing expression in the same sample or a very narrow use case where intra-sample comparison can be done without caveats, so unless you're clear in your end goal, calculating TPM might not be a detour worth taking.