Question

Quantification of RNA-Seq bam files (STAR + HTSeq count)

1

Entering edit mode

9.6 years ago

ChIP ▴ 600

Hi All,

I am working with pig genome and I had received some FASTQ transcriptome files from my lab mates.

I have processed the FASTQ files using STAR Aligner

~/Software/STAR/STAR-STAR_2.5.0a/bin/Linux_x86_64/STAR --genomeDir ~/data/genomes/pig/susScr3/star/ --readFilesIn ~/Bureaublad/Mod.GC025680.151030.HiSeq2500.FCB.lane2.gcap_dev.R1.fastqgz.gz --readFilesCommand zcat --sjdbGTFfile ~/data/genomes/pig/susScr3/star/susScr3.04012016.gtf --quantMode TranscriptomeSAM GeneCounts --runThreadN 12 --outFilterMismatchNmax 8 --outFileNamePrefix starGC025680 --outFilterMultimapNmax 10

The GTF file was taken from UCSC

It looks like this

chr1    susScr3_refGene    start_codon    133903981    133903983    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133903981    133904133    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133903981    133904133    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133914113    133914267    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133914113    133914267    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133917281    133917449    0.000000    +    2    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133917281    133917449    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133919376    133919382    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133919376    133919382    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133920137    133920252    0.000000    +    0    gene_id "NM_214429"; transcript_id "NM_214429";

Then I am using HTSeq_count function like this:

python ~/Software/HTSeq/HTSeq-0.6.1/scripts/htseq-count -m intersection-nonempty -t exon -i gene_id -f bam starGC025680Aligned.toTranscriptome.out.bam ~/data/genomes/pig/susScr3/star/susScr3.04012016.gtf -o Test

It processes the file, but dishes out

NM_214311    0
NM_214312    0
NM_214313    0
NM_214314    0
NM_214315    0
NM_214316    0
NM_214317    0
NM_214318    0
NM_214319    0
NM_214320    0
NM_214321    0
NM_214322    0
NM_214323    0
NM_214324    0
NM_214325    0

NR_130444    0
NR_131172    0
NR_132429    0
__no_feature    2844434
__ambiguous    0
__too_low_aQual    0
__not_aligned    0
__alignment_not_unique    534450

I don't know, what I am doing wrong and how can I fix this.

Kindly help

Thank you

RNA-Seq HTSeq_count STAR • 8.2k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.6 years ago by ChIP ▴ 600

Ram · Answer 1 · 2016-01-05

1

Entering edit mode

9.6 years ago

igor 13k

The problem may be that you are using --quantMode TranscriptomeSAM, which outputs alignments translated into transcript coordinates (not genomic coordinates that htseq-count expects).

Also, you are using --quantMode GeneCounts, so you don't need to run htseq-count as STAR should be generating the counts for you.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by igor 13k

Ram · Answer 2 · 2016-01-05

0

Entering edit mode

9.6 years ago

Czh3 ▴ 190

Can you paste your genome reference sequence (fasta) to us. I guess the chromosome name in your .fasta file are "1 2 3 ...", but in the GTF file are "chr1 chr2 chr3 ..."

ADD COMMENT • link 9.6 years ago by Czh3 ▴ 190

0

Entering edit mode

Hi,

I did grep on my genome file and here are the results:

>chr1
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chrX
>chrY
>chrM

It is susScr3.fa

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by ChIP ▴ 600