Basecalls performed using CASAVA version v1.8.2 Trimmed reads with fastx_quality_trimmer 0.0.13 with a quality treshhold of 18 and a length of 20 Aligned with Bowtie 2.1.0 and Tophat 2.0.10 using Gencode v19 junctions Samtools 0.1.19-44428cd to make a bam, sort, index Raw counts were generated using htseq_count 0.6.1 using the UCSC HG19 known gene transcripts from Illumina iGenomes edgeR 3.2.4 in R 3.1.0 were used to normalize counts between samples and to make comparisons between groups. Genome_build: hg19 Supplementary_files_format_and_content: Normalized reads in CPM for each detected gene along with differential gene expression log2 (fold change) and p-values for each time point
This is the data processing detail for a paper. I downloaded fastq files from this paper and used STAR alignment and featureCounts to get my read matrix with the reference of Homo_sapiens.GRCh38 from ENSEMBL. However, with featureCounts, only 3% reads were aligned. (This is single-end fastq file so I used -s 0, -s 1, -s 2 option for featureCounts but all results were bad) What went wrong with my pipeline? Is it related to reference genome that I used?
Whatever version of the genome you are using, you should see comparable mapping rates and assigned rates. But, and here is the point, you must use the same genome build for reference genome and GFF/GTF annotation file. I am guessing this is the most likely explanation of what went wrong. If you are using ENSEMBL, download both, assembly and GFF file from the same source. Check also that both files use the same identifiers for chromosomes ("1" vs. "chr1" is the archetypical source of confusion).
Thanks to your comment. But I downloaded same genome build and it worked fine with other dataset.
Possibly, if you could provide all the details, exact SRA id of dataset, complete download paths of assembly and gff (ftp://...), and all parameters used with STAR and featureCounts, that might become reproducible. I could offer to give it a short try tomorrow.
I really appreciate your help. Here are some specific details.
ENSEMBL reference file created with STAR
FASTA: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
GTF: http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz
STAR aligned (version: 2.7.9a)
Extract count matrix using featureCounts
(I tried -s 1 and -s 2 also, but didn't get better result)