Question

Bacterial (E.coli) RNA-SEQ differential gene expression (with pET system)

0

Entering edit mode

4.4 years ago

Sammy ▴ 30

Heyya,

I have some bacterial RNA-SEQ data to analyse. The goal is to get a list and read depth of all expressed genes. I have 5 different experimental conditions (in triplicate).

Although there is the same bacterial strain, a pET system has been used. In each experimental condition, there is a different plasmid sequence per construct --> I'm going to create a custom genome (with two "chromosomes") for each of them in order to perform the alignment. Would this be a good approach? In the end, I'm going to compare the gene expression between samples (after I get a list and read depth of all expressed genes). Would that work taking into account the reference genome is gonna change slightly every time (due to the different plasmid construct used)?
What alignment tool should I use? HISAT2 or STAR would work only if I'm restricting the maximum splice length to something fairly short (bacteria don't have that many splices). However, I read about some prokaryotes-specific tools available I can use. I was wondering, what about just using Bowtie2, which is not a splice-aware alignment?! Would that be alright?
This question is really specific but I'm going to ask: I work with ClearColi which is an E.coli mutant. However, I can't find the ClearColi genome. Would it be good enough to use E.coli (the parent strain) genome? My guess is that I have to read the ClearColi paper and decide if there are significant changes in the genome or just a couple of mutations. However, assuming I don't have that information, will it be alright to use the parent strain (E.coli) genome?

Thanks. As usual, any input is welcomed.

rna-seq • 1.1k views

ADD COMMENT • link updated 4.4 years ago by Joe 22k • written 4.4 years ago by Sammy ▴ 30

score 0 · Answer 1 · 2021-02-26

Although there is the same bacterial strain, a pET system has been used. In each experimental condition, there is a different plasmid sequence per construct --> I'm going to create a custom genome (with two "chromosomes") for each of them in order to perform the alignment. Would this be a good approach? In the end, I'm going to compare the gene expression between samples (after I get a list and read depth of all expressed genes). Would that work taking into account the reference genome is gonna change slightly every time (due to the different plasmid construct used)?

I think this is all you can realistically do with this experimental set up. Whether it matters or not depends on which genes you're interested in (I presume you don't care to compare the plasmids against each other, but rather the effect their presence has on the host?). Do the plasmids have the same backbone, but different inserts?

What alignment tool should I use? HISAT2 or STAR would work only if I'm restricting the maximum splice length to something fairly short (bacteria don't have that many splices). However, I read about some prokaryotes-specific tools available I can use. I was wondering, what about just using Bowtie2, which is not a splice-aware alignment?! Would that be alright?

You wouldnt use splice-aware aligners for bacteria, since they don't do any splicing - they have no introns. You can map your reads with any one of a number of tools (bwa and bowtie2 are among the most common so they should be fine).

This question is really specific but I'm going to ask: I work with ClearColi which is an E.coli mutant. However, I can't find the ClearColi genome. Would it be good enough to use E.coli (the parent strain) genome? My guess is that I have to read the ClearColi paper and decide if there are significant changes in the genome or just a couple of mutations. However, assuming I don't have that information, will it be alright to use the parent strain (E.coli) genome?

I'm familiar with this strain, and as far as I know, they have just manipulated the LPS genes. There may be some extra gene deletions etc. You can probably find out the genotype in general form, but the sequence will not be publically available since it is a commercial strain subject to IP restriction. You could sequence the reference yourself (which would be the best approach), but you may have agreed to some terms when you used it that you would not publish the sequences etc. Tread very carefully here. Using the parental strain may be an option, but every step you take away from the 'biological reality' of the system you have created is going to add noise to an already very noisy type of experiment.