Forum:How to calculate FPKM values of interested gene
2
3
Entering edit mode
10.3 years ago
Ginsea Chen ▴ 140

FPKM, Fragments Kilobase of exon model per millon mapped reads, which can be used to indicate the expression (abundance) characteristics of genes. Now I want to describe the operation about obtaining interested gene FPKM value.

Software Download

  1. fastq-dump: convert sra file to fastq file.

    website: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

  2. bowtie:an ultrafast and memory efficient tool for aligning sequencing reads to long reference sequences.

    website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

  3. cufflinks:assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

    website: http://cufflinks.cbcb.umd.edu/

  4. gffread: convert gff3 file to gtf file.

    website: http://cufflinks.cbcb.umd.edu/ (This program is included with cufflinks package)

Operation

  1. Download genome.fa and genes.gff3 file from genome website; Download sra file from NCBI
  2. Format conversion

    $ fastq-dump -I --split-files SRR123456789.sra # convert sra file to fastq file
    $ gffread -E genes.gff3 -o genes.gtf # convert gff3 file to gtf file
    
  3. Index files

    $bowtie2-build genome.fa genome
    
  4. Alignment

    $ bowtie2 -x genome -1 SRR123456789_1.fastq -2 SRR123456789_2.fastq -S SRR123456789.sam
    $ samtools view -bS SRR123456789.sam > SRR123456789.bam
    $ samtools sort SRR123456789.bam SRR123456789
    
  5. FPKM values

    $ cufflinks SRR123456789.bam -G genes.gtf -o result
    

After these operations, we can extract FPKM values from genes.fpkm_tracking file based on gene ID.

Note: If you find some bugs, please contact me.

cufflinks FPKM bowtie2 RNA-Seq • 12k views
ADD COMMENT
3
Entering edit mode
10.3 years ago

I just wanted to point out that transcript abundance expressed as FPKM/RPKM has some biases and it is less easy to interpret than "transcript per million" or TPM. See, among other sources, Wagner et al. 2012 (Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples) for a good explanation.

ADD COMMENT
0
Entering edit mode

Hi dariober

Thanks for your suggestion, I am reading this paper now.

ADD REPLY
2
Entering edit mode
10.3 years ago

Why use Bowtie2 instead of TopHat2?

ADD COMMENT
0
Entering edit mode

Hi Warden

Thanks for your return! In fact, Tophat was usually used in many papers. Before this, I tried to use tophat to map reads to the reference genome, but the system always told me no enough RAM space and can't complete tophat analysis in my computer. As the introduction of Tophat, I knew it is a software based on bowtie. So bowtie2 was used in this operation. Do you mean I shouldn't use Bowtie2 instead of Tophat2?

Thank you again, sir!

ADD REPLY
0
Entering edit mode

Bowtie2 might be better than Bowtie1, but TopHat is specifically designed to handle alignments across exon junctions for a chromsome-based reference. If you were using transcripts as a reference, I think Bowtie would be OK (but then you would typically be using something like eXpress rather than cufflinks as a reference). However, the general consensus is that TopHat is preferred for genomic alignments (such as against hg19, mm9, etc).

I would recommending seeing if your institution has a large, shared Linux server to perform the alignments, which shouldn't have the RAM problem. I rarely run alignments on my desktop. Not sure if that is your problem.

ADD REPLY
0
Entering edit mode

Thanks for your suggestion, Linux server should be a good choice, but there were no equipment for me.

In fact, I am not sure the error of tophat is for no enough RAM space, the run log of tophat is pasted as follows:

[2014-07-03 23:13:54] Beginning TopHat run (v2.0.11)
-----------------------------------------------
[2014-07-03 23:13:54] Checking for Bowtie
          Bowtie version:     2.2.3.0
[2014-07-03 23:13:54] Checking for Samtools
        Samtools version:     0.1.19.0
[2014-07-03 23:13:54] Checking for Bowtie index files (genome)..
[2014-07-03 23:13:54] Checking for reference FASTA file
[2014-07-03 23:13:54] Generating SAM header for genome
[2014-07-03 23:13:54] Reading known junctions from GTF file
[2014-07-03 23:13:55] Preparing reads
     left reads: min. length=120, max. length=120, 19269414 kept reads (53310 discarded)
    right reads: min. length=120, max. length=120, 19263005 kept reads (59719 discarded)
[2014-07-03 23:32:02] Building transcriptome data files result/tmp/gene
[2014-07-03 23:32:07] Building Bowtie index from gene.fa
[2014-07-03 23:32:41] Mapping left_kept_reads to transcriptome gene with Bowtie2 
[sam_read1] missing header? Abort!
    [FAILED]
Error running bowtie:
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
terminate called recursively
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

I tried to find many ways to solve it but no successful, many people of forum (such as SEQanswers) thought that it is for no enough RAM space.

I am a new of RNA-seq, Thanks for your help again.

ADD REPLY
0
Entering edit mode

I typically build the bowtie2 reference from source for TopHat2 analysis - if I remember, I think there are some issues with the reference that is provided. For example, is there actually any sequence in the .fasta file?

ADD REPLY
0
Entering edit mode

This problem may be a bug of tophat, I have solved it through addition of line bowtie_header_cmd += ["-x"] in the tophat code as the description of Google forum

Thanks for your help!

ADD REPLY
0
Entering edit mode

Hi, is it OK to use bowtie instead of TopHat if I am working with prokaryotic genomes which don't have introns like eukaryotic ones?

ADD REPLY

Login before adding your answer.

Traffic: 1422 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6