aligning with bacterial genome
2
1
Entering edit mode
8.6 years ago
HK ▴ 40

hey All,

i have few RNA seq samples (healthy and diseased). I have already aligned my samples with human hg19 reference using tophat2. Now i am trying to align these reads to the bacterial genome in order to know if the samples also have some bacterial genome in them or not. I need suggestions which tool should i use for this. i did try tophat2 but do you know any better one.

RNA-Seq bacterial-genome • 4.2k views
ADD COMMENT
2
Entering edit mode

Do you already know the particular bacteria that you want to align against, or are you still trying to determine that? For what it's worth, bacteria tend to not have splicing, so you can often get away with directly using bowtie2/bwa/etc.

ADD REPLY
0
Entering edit mode

Yes, i am using streptococcus pneumonia ATCC700669 (FM211187). I downloaded the fasta file.. made the index file by bowtie-build and then mapped using tophat2. The result that i got for the diseased sample is :

Left reads:
          Input     :    164672
           Mapped   :       540 ( 0.3% of input)
            of these:       512 (94.8%) have multiple alignments (0 have >20)
Right reads:
          Input     :    164672
           Mapped   :      1119 ( 0.7% of input)
            of these:      1057 (94.5%) have multiple alignments (0 have >20)
Unpaired reads:
          Input     :     48331
           Mapped   :       148 ( 0.3% of input)
            of these:       141 (95.3%) have multiple alignments (0 have >20)
 0.5% overall read mapping rate.

Aligned pairs:         6
     of these:         6 (100.0%) have multiple alignments
                       2 (33.3%) are discordant alignments
 0.0% concordant pair alignment rate.

And for the healthy (was just exoerimenting with the healthy sample, what comes out)

Left reads:
          Input     :   1254299
           Mapped   :       843 ( 0.1% of input)
            of these:       156 (18.5%) have multiple alignments (0 have >20)
Right reads:
          Input     :   1254299
           Mapped   :      2527 ( 0.2% of input)
            of these:      1700 (67.3%) have multiple alignments (0 have >20)
 0.1% overall read mapping rate.

Aligned pairs:       719
     of these:        45 ( 6.3%) have multiple alignments
 0.1% concordant pair alignment rate.

By just looking at the result, do you say that the bacterial genome remain are into the sample???

ADD REPLY
1
Entering edit mode

I just saw this paper mentioned on twitter (it literally just came out). It and some of the references therein may be of interest to you. That particular paper is for one of the iobio tools, which are always really slick.

ADD REPLY
0
Entering edit mode

Our internal threshold for calling a sample contaminated is 0.5% unique alignments, so I guess the diseased sample is borderline. I don't know where the samples were sourced from, so you might not expect a high amount of the bugs in the samples, even if the patient had them.

ADD REPLY
1
Entering edit mode

you could use SNAP/Bowtie2 to align the reads against bacterial genomes from NCBI. There are pipelines built for this, but it would be tedious if your main goal is not to identify the pathogens in the data.

http://chiulab.ucsf.edu/surpi/

ADD REPLY
2
Entering edit mode
6.8 years ago
predeus ★ 2.1k

Bacterial RNA would hardly be detectable in human RNA-seq library - if you do poly-A selection, bacterial reads won't be there since there's almost no poly-A tails, and total RNA would still not capture bacterial RNA since they decay too fast (you need a special protocol for bacterial RNA-seq). Also, there should not be any DNA in RNA-seq data, if it's done properly.

To answer your question though, you can use any good short read mapper (bwa, bowtie2) to align to bacterial genome, since there is no splicing. If you have no idea what bacteria you would expect to find, use Centrifuge/Kraken with nt database.

ADD COMMENT
0
Entering edit mode

Poly A selection is quite common, but OP didn't mention his protocol. Also, poly-A selection biases your samples away from other human RNA species that don't have the tail. You don't need any special protocol for bacterial RNA-seq if you're working in an RNAse free environment. Therefore, bacterial sequences are (anecdotally) quite common in human samples that haven't been processed with care, or are taken from tissues with microbiota. OP is asking for a recommendation for an aligner for bacterial samples that doesn't concern itself with splicing (you can imagine the algorithmic mistakes a splice-aware aligner can make in the densely packed bacterial genomes). Aligning against bacterial genomes is a good quality control routine, especially when considering the prevalence of laboratory contaminants such as Mycoplasma.

From my experience, even with several rounds of DNAse treatment, if you are sequencing deeply, a residual noise in the genome can be observed that can be explained as either spurious transcription or DNA contamination that evades multiple DNAse rounds. This has been described in the following papers.

https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-734

http://mbio.asm.org/content/3/4/e00156-12.short

http://jb.asm.org/content/197/1/18.short

http://advances.sciencemag.org/content/2/3/e1501363.abstract

I'm guessing this is why you were downvoted for the 'quick-to-dismiss' fairly common sample contaminants including bacterial sequences and even DNA sequences.

ADD REPLY
1
Entering edit mode
8.6 years ago

Look information on BBSplit

ADD COMMENT

Login before adding your answer.

Traffic: 1109 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6