How to get TPM values from the STAR output?
1
0
Entering edit mode
2.4 years ago
pavelasquezv ▴ 50

Hi friends,

I need TPM values for gene correlation analysis the multiple experiments. How to get TPM values from the STAR output?

Please help me with the following questions.

  1. According to the manual here I must take the second column to downstream analysis, right?

  2. This is the TPM value or just the raw counts?

  3. If not the TPM values, what method do you recommend to get the TPM values?

This is the head SRR5570693_ReadsPerGene.out.tab:

==> SRR5570693_ReadsPerGene.out.tab <==
N_unmapped      2047114 2047114 2047114
N_multimapping  1022855 1022855 1022855
N_noFeature     545783  5073330 4998597
N_ambiguous     72548   10439   14263
LOC119628875    0       0       0
LOC101741181    471     233     238
LOC101739479    855     421     434
LOC105842509    14      5       9
LOC101741407    354     190     164
LOC101739615    67      35      32

Many thanks!

values RNAseq TPM STAR • 3.8k views
ADD COMMENT
2
Entering edit mode
2.4 years ago

Yes, the second column, the left most column of numbers, is your expected counts. Not TPM.

I don't know that there's an easy way to get TPM from this data; it's far simpler to use RSEM on the transcriptome bam output of STAR, or start over and use kallisto or salmon.

ADD COMMENT
1
Entering edit mode

You can also directly use salmon on the transcriptome BAM output of STAR (though I believe OP currently has a genome-centric BAM).

ADD REPLY
1
Entering edit mode

You can tell STAR to output a transcriptome-centered bam, even if the alignment was to genome.

ADD REPLY
1
Entering edit mode

Yes, but you have to provide that argument during alignment (i.e. if OP already has just a genome-centric BAM they will have to realign).

ADD REPLY
0
Entering edit mode

Hi friends, many thanks for your reply. Now I am using kallisto:

kallisto quant -i /Storage/data1/angelica.vargas/bombyx_mori/kallisto_index/my_bombyx_mori.idx -o results --single \
-l 250 -s 25 -t $NSLOTS SRR5570692_SR.fastq \
-gtf  /Storage/data1/angelica.vargas/bombyx_mori/genomes/GCF_014905235.1_Bmori_2016v1.0_genomic.gtf

But the results is not by gene (LOCxxxxxx) as I need. Do you have any idea what is wrong?

This is the results:

target_id       length  eff_length      est_counts      tpm
NC_051358.1     20666287        2.0666e+07      243662  70.2797
NC_051359.1     8396445 8.3962e+06      127806  90.7332
NC_051360.1     15212953        1.52127e+07     430820  168.806
NC_051361.1     18737234        1.8737e+07      278470  88.5885
NC_051362.1     19061979        1.90617e+07     662676  207.223
NC_051363.1     16650604        1.66504e+07     204847  73.334
NC_051364.1     13944894        1.39446e+07     249372  106.595
NC_051365.1     16262221        1.6262e+07      351907  128.989
NC_051366.1     16796068        1.67958e+07     371446  131.824
ADD REPLY
0
Entering edit mode

Those are chromosomes, not genes or transcripts.

ADD REPLY
0
Entering edit mode

Hi swbarnes2, many thanks for your reply. Yes, I think that was a problem when I made the index. Do you think that is correct? I don't know why the result comes out for chromosomes and not for genes. many thanks again

kallisto index -i my_bombyx_mori.idx  /Storage/data1/angelica.vargas/bombyx_mori/genomes/GCF_014905235.1_Bmori_2016v1.0_genomic.fna
ADD REPLY
0
Entering edit mode

Have you looked up what types of input files Kallisto wants for making the index?

Do you understand how Kallisto is different from STAR?

ADD REPLY
0
Entering edit mode

Many thanks for your reply swbarnes2! You are right. Now I understand the difference. I needed to rna.fasta as input to build the Kallisto transcriptome index. I was working with genomic.fasta file. Many thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6