Hi,
Can anyone help me (simple way) to get BAM file from fastq file in RNA-seq.
Many thanks
Hi,
Can anyone help me (simple way) to get BAM file from fastq file in RNA-seq.
Many thanks
Fastq files contain the sequences, obtained by the sequencer with the respective quality score. You should first get a background in sequence alignments (tools: BWA, bowtie, bowtie2, BBMap) and then learn how to filter these alignments for the information you need (read about SAM file format, SAMtools, MAPQ score, insert sizes, single-end/paired-end sequencing).
Probably the easiest option for bacterial RNA-seq is Rockhopper as it offers a graphical interface: http://cs.wellesley.edu/~btjaden/Rockhopper/
At first download the reference fasta sequence from ensembl or ncbi or other suitable resource.A quick search in these resources with your species name should easily take you to related files including the fasta.
In ensembl look for download fasta link.In ncbi look for genome category.
Then make an initial choice about the aligner you would like to use.bwa or so should be good to start with.
Index the fasta as per the tool being used.
Align the reads using the appropriate command.give a try with bwa mem.
The output sam will contain aligned as well as unaligned. you may read further on the topics that Alexander has mentioned above to understand more including the sam flags etc that would help out to separate the aligned and unaligned and so on.
It is indeed better to do preprocessing , mainly, adapter trimming and quality trimming on the fastq files before the alignment step.Read on cutadapt and sickle etc in that regard.
For most of the doubts/troubles you may have subsequently you should be able to find answers in many other posts here.
May I also suggest STAR as well as a good tool for aligment.But being bacteria, don't miss to read the small genomes section in manual and give arguments accordingly.
Jf
I am using Rockhopper to analyse the RNA seq bacterial data sets. I gave two fastq files as input along witht the reference. The two data sets are from two different experimental conditions. For differential gene expression its giving 0% which is nott helpful at all. Is there anything I am missing out here?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.