Dear all, First of all-thanks for your time. I am relatively new to the RNAseq world and I am currently struggling with transcriptome data that I want to align to a reference genome in order to obtain data on differentially expressed genes. I am working with prokaryotic organisms that do NOT have greatly annotated genomes that can be downloaded. Instead I did perform a genome sequencing myself and have a certain amount of larger contigs to which I have a preliminary annotation file from RAST.
Here are the basic steps as far as I understood them:
1) Get sequences
2) Align sequences against the reference genome using BWA/Bowtie (I used BWA), take the SAM files and convert them to sorted and indexed BAM files.
3) Use the GenomicFeatures package in R to summarize the reads by genes in each location.
Here is where I got stuck- I tried to make my own transcript database using the "makeTranscriptDB" command. Unfortunately i do NOT have information concerning splice sites as I work with prokaryotes and I am not sure how to handle this (it is a requisite file for the command). Any good suggestions ?
4) I have not gotten this far but in theory I would need to perform differential expression testing using a package in R - any good suggestions for prokaryotes ?
Is this workflow, at least in theory, correct ? Any help will be greatly appreciated ! Thanks in advance. Lars
Tablet (http://bioinf.scri.ac.uk/tablet/) is also a good option to visualize aligned data (just need your BAM indexed)
Thanks Leonor, I will try the girafe package for starters. I am not really experienced in R, is it necessary to use a a custom script ? I might come back and ask additional questions later-thanks !
No, I only use custom scripts because I know exactly how I want my figures to look like (I'm just very picky). Using the girafe package should get you where you want :-)