Generating Counts Data From Fastq Sequence Files
2
2
Entering edit mode
10.7 years ago
josph.sh ▴ 20

I'm new to sequencing and I've currently got several FASTQ files containing data corresponding to sequencing experiments (sequenced using Illumina miseq).

I was hoping to carry out some expression analysis (with edgeR, probably) using this data, but I'll need to generate a counts matrix from this data. Could somebody provide some instruction on how to generate counts data from a FASTQ file?

sequencing fastq rna-seq counts differential-expression • 11k views
ADD COMMENT
2
Entering edit mode
10.7 years ago
  1. You will have to first align those fastq files against the reference genome and produce SAM/BAM files.Tophat, STAR and many other splice aware RNA-seq aligners are available for this task. It is always good to preprocess your read data including QC, trimming off the low quality bases etc.

  2. Then you need to use some tool that will generate count data for you. Basically you will have to provide the aligned BAM file and the gene annotation file (gff3, gtf,bed format) for your reference genome. HTSeq, Cufflinks are some tools available for this task. Search "Biostar" and you will get names of other tools.

ADD COMMENT
0
Entering edit mode

Thank you for your reply.

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thanks for the link.

ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6