Bacterial (E.coli) RNA-SEQ differential gene expression (with pET system)
1
0
Entering edit mode
3.8 years ago
Sammy ▴ 30

Heyya,

I have some bacterial RNA-SEQ data to analyse. The goal is to get a list and read depth of all expressed genes. I have 5 different experimental conditions (in triplicate).

  1. Although there is the same bacterial strain, a pET system has been used. In each experimental condition, there is a different plasmid sequence per construct --> I'm going to create a custom genome (with two "chromosomes") for each of them in order to perform the alignment. Would this be a good approach? In the end, I'm going to compare the gene expression between samples (after I get a list and read depth of all expressed genes). Would that work taking into account the reference genome is gonna change slightly every time (due to the different plasmid construct used)?

  2. What alignment tool should I use? HISAT2 or STAR would work only if I'm restricting the maximum splice length to something fairly short (bacteria don't have that many splices). However, I read about some prokaryotes-specific tools available I can use. I was wondering, what about just using Bowtie2, which is not a splice-aware alignment?! Would that be alright?

  3. This question is really specific but I'm going to ask: I work with ClearColi which is an E.coli mutant. However, I can't find the ClearColi genome. Would it be good enough to use E.coli (the parent strain) genome? My guess is that I have to read the ClearColi paper and decide if there are significant changes in the genome or just a couple of mutations. However, assuming I don't have that information, will it be alright to use the parent strain (E.coli) genome?

Thanks. As usual, any input is welcomed.

rna-seq • 818 views
ADD COMMENT
0
Entering edit mode
3.8 years ago
Joe 21k

Although there is the same bacterial strain, a pET system has been used. In each experimental condition, there is a different plasmid sequence per construct --> I'm going to create a custom genome (with two "chromosomes") for each of them in order to perform the alignment. Would this be a good approach? In the end, I'm going to compare the gene expression between samples (after I get a list and read depth of all expressed genes). Would that work taking into account the reference genome is gonna change slightly every time (due to the different plasmid construct used)?

I think this is all you can realistically do with this experimental set up. Whether it matters or not depends on which genes you're interested in (I presume you don't care to compare the plasmids against each other, but rather the effect their presence has on the host?). Do the plasmids have the same backbone, but different inserts?

What alignment tool should I use? HISAT2 or STAR would work only if I'm restricting the maximum splice length to something fairly short (bacteria don't have that many splices). However, I read about some prokaryotes-specific tools available I can use. I was wondering, what about just using Bowtie2, which is not a splice-aware alignment?! Would that be alright?

You wouldnt use splice-aware aligners for bacteria, since they don't do any splicing - they have no introns. You can map your reads with any one of a number of tools (bwa and bowtie2 are among the most common so they should be fine).

This question is really specific but I'm going to ask: I work with ClearColi which is an E.coli mutant. However, I can't find the ClearColi genome. Would it be good enough to use E.coli (the parent strain) genome? My guess is that I have to read the ClearColi paper and decide if there are significant changes in the genome or just a couple of mutations. However, assuming I don't have that information, will it be alright to use the parent strain (E.coli) genome?

I'm familiar with this strain, and as far as I know, they have just manipulated the LPS genes. There may be some extra gene deletions etc. You can probably find out the genotype in general form, but the sequence will not be publically available since it is a commercial strain subject to IP restriction. You could sequence the reference yourself (which would be the best approach), but you may have agreed to some terms when you used it that you would not publish the sequences etc. Tread very carefully here. Using the parental strain may be an option, but every step you take away from the 'biological reality' of the system you have created is going to add noise to an already very noisy type of experiment.

ADD COMMENT
0
Entering edit mode

Hi Joe. Thank you for your response.

I am aware the comparison might be tricky based on those experimental conditions. However, personally, I am more interested in the list and read depth of all expressed genes for each bacterial construct. This can't be that noisy, can it?

Yes, the plasmid has the same backbone but different inserts. Each construct expresses a different class of the same protein. We are interested in which class is more abundantly transcribed.

Regarding differential expression, I will deal with that issue separate, once I get there.

ADD REPLY
0
Entering edit mode

Something else you might be able to do if your inserts are suitably diverse, would be to create an artificial reference of all the sequences in question for all conditions. So you could have something like:

>Chromosomal Sequence
ATGC...
>pET backbone/empty vector reference
ATGC...
>Insert of interest #1
ATGC
>Insert of interest #2
ATGC
>Insert of interest #2
ATGC

If you map everything against everything, you should see no signal for insert #3 in your #1 and #2 samples for example. I think this will work at least, but it will depend heavily on those inserts being diverse enough that reads aren't mapping between samples.

This can't be that noisy, can it?

All of your conditions are going to have noise. There will be fluctuations in the transcription regardless of what you do, subtle differences between library preps/RNA isolations etc, that's just the nature of sensitive techniques like this. The problem is that you will be confounding this by using references which are not 100% identical, so some of your reads may not map if the sequence identity is off. Now, if you are using the same reference for them all, and mapping with the same parameters, your noise/error will at least be systematic so it might not matter but its something to bear in mind.

You will need good controls for this, so I would ensure you have an empty vector control for all of your conditions.

ADD REPLY

Login before adding your answer.

Traffic: 2248 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6