Question

Forum:How to calculate FPKM values of interested gene

3

Entering edit mode

10.3 years ago

Ginsea Chen ▴ 140

FPKM, Fragments Kilobase of exon model per millon mapped reads, which can be used to indicate the expression (abundance) characteristics of genes. Now I want to describe the operation about obtaining interested gene FPKM value.

Software Download

fastq-dump: convert sra file to fastq file.

website: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
bowtie:an ultrafast and memory efficient tool for aligning sequencing reads to long reference sequences.

website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
cufflinks:assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

website: http://cufflinks.cbcb.umd.edu/
gffread: convert gff3 file to gtf file.

website: http://cufflinks.cbcb.umd.edu/ (This program is included with cufflinks package)

Operation

Download genome.fa and genes.gff3 file from genome website; Download sra file from NCBI

Format conversion

$ fastq-dump -I --split-files SRR123456789.sra # convert sra file to fastq file
$ gffread -E genes.gff3 -o genes.gtf # convert gff3 file to gtf file

Index files
```
$bowtie2-build genome.fa genome
```

Alignment

$ bowtie2 -x genome -1 SRR123456789_1.fastq -2 SRR123456789_2.fastq -S SRR123456789.sam
$ samtools view -bS SRR123456789.sam > SRR123456789.bam
$ samtools sort SRR123456789.bam SRR123456789

FPKM values

$ cufflinks SRR123456789.bam -G genes.gtf -o result

After these operations, we can extract FPKM values from genes.fpkm_tracking file based on gene ID.

Note: If you find some bugs, please contact me.

cufflinks FPKM bowtie2 RNA-Seq • 12k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Ginsea Chen ▴ 140

score 3 · Answer 1 · 2014-07-03

3

Entering edit mode

10.3 years ago

dariober 15k

I just wanted to point out that transcript abundance expressed as FPKM/RPKM has some biases and it is less easy to interpret than "transcript per million" or TPM. See, among other sources, Wagner et al. 2012 (Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples) for a good explanation.

ADD COMMENT • link 10.3 years ago by dariober 15k

0

Entering edit mode

Hi dariober

Thanks for your suggestion, I am reading this paper now.

ADD REPLY • link 10.3 years ago by Ginsea Chen ▴ 140

Ram · Answer 2 · 2014-07-03

2

Entering edit mode

10.3 years ago

Charles Warden 8.3k

Why use Bowtie2 instead of TopHat2?

ADD COMMENT • link 10.3 years ago by Charles Warden 8.3k

0

Entering edit mode

Hi Warden

Thanks for your return! In fact, Tophat was usually used in many papers. Before this, I tried to use tophat to map reads to the reference genome, but the system always told me no enough RAM space and can't complete tophat analysis in my computer. As the introduction of Tophat, I knew it is a software based on bowtie. So bowtie2 was used in this operation. Do you mean I shouldn't use Bowtie2 instead of Tophat2?

Thank you again, sir!

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Ginsea Chen ▴ 140

0

Entering edit mode

Bowtie2 might be better than Bowtie1, but TopHat is specifically designed to handle alignments across exon junctions for a chromsome-based reference. If you were using transcripts as a reference, I think Bowtie would be OK (but then you would typically be using something like eXpress rather than cufflinks as a reference). However, the general consensus is that TopHat is preferred for genomic alignments (such as against hg19, mm9, etc).

I would recommending seeing if your institution has a large, shared Linux server to perform the alignments, which shouldn't have the RAM problem. I rarely run alignments on my desktop. Not sure if that is your problem.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Charles Warden 8.3k

0

Entering edit mode

Thanks for your suggestion, Linux server should be a good choice, but there were no equipment for me.

In fact, I am not sure the error of tophat is for no enough RAM space, the run log of tophat is pasted as follows:

[2014-07-03 23:13:54] Beginning TopHat run (v2.0.11)
-----------------------------------------------
[2014-07-03 23:13:54] Checking for Bowtie
          Bowtie version:     2.2.3.0
[2014-07-03 23:13:54] Checking for Samtools
        Samtools version:     0.1.19.0
[2014-07-03 23:13:54] Checking for Bowtie index files (genome)..
[2014-07-03 23:13:54] Checking for reference FASTA file
[2014-07-03 23:13:54] Generating SAM header for genome
[2014-07-03 23:13:54] Reading known junctions from GTF file
[2014-07-03 23:13:55] Preparing reads
     left reads: min. length=120, max. length=120, 19269414 kept reads (53310 discarded)
    right reads: min. length=120, max. length=120, 19263005 kept reads (59719 discarded)
[2014-07-03 23:32:02] Building transcriptome data files result/tmp/gene
[2014-07-03 23:32:07] Building Bowtie index from gene.fa
[2014-07-03 23:32:41] Mapping left_kept_reads to transcriptome gene with Bowtie2 
[sam_read1] missing header? Abort!
    [FAILED]
Error running bowtie:
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
terminate called recursively
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

I tried to find many ways to solve it but no successful, many people of forum (such as SEQanswers) thought that it is for no enough RAM space.

I am a new of RNA-seq, Thanks for your help again.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Ginsea Chen ▴ 140

0

Entering edit mode

I typically build the bowtie2 reference from source for TopHat2 analysis - if I remember, I think there are some issues with the reference that is provided. For example, is there actually any sequence in the .fasta file?

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Charles Warden 8.3k

0

Entering edit mode

This problem may be a bug of tophat, I have solved it through addition of line bowtie_header_cmd += ["-x"] in the tophat code as the description of Google forum

Thanks for your help!

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Ginsea Chen ▴ 140

0

Entering edit mode

Hi, is it OK to use bowtie instead of TopHat if I am working with prokaryotic genomes which don't have introns like eukaryotic ones?

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by liuyifan2014 ▴ 110