How to calculate TPM from featureCounts output
2
0
Entering edit mode
17 months ago
survive • 0

I would like to find the TPM counts for the GSE102073 study. When I downloaded the raw data from GEO, the raw data are featureCounts output.

First part of the file:

# Program:featureCounts v1.4.3-p1; Command:"/data/NYGC/Software/Subread/subread-1.4.3-p1-Linux-x86_64/bin/featureCounts" "-s" "2" "-a" "/data/NYGC/Resources/ENCODE/Gencode/gencode.v18.annotation.gtf" "-o" "/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/featureCounts/Sample_JB4853_counts.txt" "/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/STAR_alignment/Sample_JB4853_Aligned.out.WithReadGroup.sorted.bam"
Geneid  Chr     Start   End     Strand  Length  /data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/STAR_alignment/Sample_JB4853_Aligned.out.WithReadGroup.sorted.bam
ENSG00000223972.4       chr1;chr1;chr1;chr1     11869;12595;12975;13221 12227;12721;13052;14412 +;+;+;+ 1756    0
ENSG00000227232.4       chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1        14363;14970;15796;16607;16854;17233;17498;17602;17915;18268;24734;29321;29534   14829;15038;15947;16765;1705

How can I convert this into tpm counts?

I tried the method from this post but it requires a counts file which I don't have access to; or this post but I am confused on how to use tximport to get the tpm counts nor the input variable featureLength and meanFragmentLength.

Thank you.

rna-seq TPM featurecounts • 3.3k views
ADD COMMENT
0
Entering edit mode

This file is your counts file, isn't it?

ADD REPLY
0
Entering edit mode

featureCounts file

ADD REPLY
0
Entering edit mode

Yes, I thought the featureCounts file is your counts file.

ADD REPLY
2
Entering edit mode
17 months ago

For (accurate) TPMs you may want to consider processing your raw sequencing data through Salmon or Kallisto. These programs accurately estimate abundances at the transcript level which results in better TPM estimates. See the bioconductor RNA-seq guide for more info.

ADD COMMENT
0
Entering edit mode

hi, So i will need to download the raw .fastq file from SRA and run salmon/Kallisto, instead of using the featureCounts?

ADD REPLY
1
Entering edit mode

If you want to go that route, yes.

ADD REPLY
2
Entering edit mode
17 months ago
bioinfo_ga ▴ 70

hi , You can use a python package rnanorm [https://pypi.org/project/rnanorm/]. The input required are your read count values from feature counts along with the length of your genes/transcripts which can be fetched from reference gtf/gff file.

ADD COMMENT

Login before adding your answer.

Traffic: 2063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6