Question

Whole genome sequencing and RNA sequencing data analysis for a bacterial strain

0

Entering edit mode

3.3 years ago

salmon ▴ 10

Hello, I am new RNA sequencing work. I have raw fastq files (from extracted RNA and DNA both) for bacteria "Paucibacter toxinivorans strain IM4". I am not able to find whole reference genome for the same bacteria. But, there is partial 16S sequence available at NCBI https://www.ncbi.nlm.nih.gov/nuccore/1031488746

Do I first need to process DNA fastq files for whole genome sequencing and then move to RNA sequencing analysis? If not, Can I use the available partial 16S sequence for alignment? Please can anybody guide me ?

Original title: Whole genome sequencing for a bacterial strain for RNA sequencing

RNA-Seq Gene Whole genome sequencing • 1.9k views

ADD COMMENT • link 3.2 years ago by salmon ▴ 10

1

Entering edit mode

There is one genome available for this bacterium. It may not be your exact strain but it may work for a start.

ADD REPLY • link 3.3 years ago by GenoMax 148k

1

Entering edit mode

Contact project PI for scripts/pipelines used in DSM 16998 genome assembly and it's annotation. https://genome.jgi.doe.gov/portal/PautoxDSM16998_FD/PautoxDSM16998_FD.info.html or tie up with a local bioinformatician.

ADD REPLY • link 3.3 years ago by cpad0112 21k

score 1 · Answer 1 · 2021-09-16

1

Entering edit mode

3.3 years ago

Friederike 9.0k

If you have the full genome sequenced, then it'd be preferable to first assemble the genome, annotate it (i.e. determine where gene regions are) and then use that to align the RNA-seq to. This seems like a pretty good run-down of the different ways to sequence and assemble bacterial genomes; I'm sure there are many more on PubMed.

If I interpret correctly, you're saying that you have bulk RNA-seq data (= potentially all transcripts of that one bacteria strain), which is very different from ribosomal DNA (!) sequencing that's usually applied to a MIX of different bacteria and is typically used to simply identify the different species present in the mix. I don't see how your data set would benefit from focusing on rRNA genes. That being said, why are you looking at that data set to begin with?

ADD COMMENT • link 3.3 years ago by Friederike 9.0k

0

Entering edit mode

Thank you very much for the reply. I have RNA-seq data (in fastaq format) from control and treatment group for the strain. I also have the DNA sequence for the same pure culture. Do you mean (Please correct me if I am wrong), I have to first assemble and annotate the DNA sequences (obtained in fastaq format) of the strain and then used the same as reference genome to align to the RNA sequences (obtained in fastq format).

ADD REPLY • link 3.3 years ago by salmon ▴ 10

1

Entering edit mode

first assemble and annotate the DNA sequences (obtained in fastaq format) of the strain then used the same as reference genome to align to the RNA sequences (obtained in fastq format)

Correct. That would be the ideal workflow.

ADD REPLY • link 3.3 years ago by Friederike 9.0k

0

Entering edit mode

Thank you so much

ADD REPLY • link 3.2 years ago by salmon ▴ 10