Question

Bacterial RNAseq - Help for beginner

0

Entering edit mode

4.6 years ago

liz.marjory ▴ 10

Hi everyone

I just begin with the analysis of my RNAseq data. I have my own reference genome and its annotated file. I've been reading and I decided to use the following pipeline:

FastQC ->Trimmomatic -> Bowtie2 -> HTSEQ

But, I've had some problems, for example

My reads have good quality but there are some duplicates sequences. When I use Trimmomatic it does not change, the duplicate sequences are still there and I don´t know why.

java -jar /path/trimmomatic.jar PE control_rep1_1.fq.gz control_rep1_2.fq.gz control_rep1_1_paired.fq.gz control_rep1_1_unpaired.fq.gz control_rep1_2_paired.fq.gz control_rep1_2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36

Despite that I went ahead with the analysis just for practice.

When I used HTSeq the results were like this

Warning: Mate records missing for 26418 records; first such record: <SAM_Alignment object: Paired-end read 'A00877:83:HNHLVDSXX:3:1519:15230:29340' aligned to tig00000001:[616,766)/+>.
8382045 BAM alignment pairs processed.

no feature 7727004
ambiguous 0
Too low aQual 278161
not aligned 376880
alignment_not_unique 0

Some forum say that it could be for the annotation file structure. Somebody knows what is happening?

Pd. I really need help and would like to talk with someone maybe for skype or something like that. If there is somebody with some free time, please let me know.

Thanks

next-gen RNA-Seq Bacteria • 2.1k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 4.6 years ago by liz.marjory ▴ 10

0

Entering edit mode

Observing duplication in RNA-seq is normal and expected, don't worry. Did you see any adapter contamination in your data based on fastqc? If not then there is no need to run trimmomatic. Please share the command lines for both bowtie and htseq.

Recommended literature:

https://www.annualreviews.org/doi/abs/10.1146/annurev-biodatasci-072018-021255

https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

Hi ATpoint This is the command lines for bowtie and htseq

bowtie2 -x /home/liz/RNASEQBA135/genoma/aminobacter_index -1 control_rep1_1_paired.fq.gz -2 control_rep1_2_paired.fq.gz -S control_rep1_paired.sam

samtools view -Sb control_rep1_paired.sam > control_rep1_paired.bam samtools sort control_rep1_paired.bam -o control_rep1_paired_sorted.bam samtools index control_rep1_paired_sorted.bam

htseq-count --format bam --order pos --mode intersection-nonempty -s reverse -t gene -s no control_rep1_paired_sorted.bam /home/liz/RNASEQBA135/anotacion/BA135_aminobacter.gtf > control1_htseq.tsv

ADD REPLY • link 4.6 years ago by liz.marjory ▴ 10

0

Entering edit mode

I have 2 conditions (2 replicates each one) Control and Stress. I have to found wich genes are expressed differentially for my bacteria

ADD REPLY • link 4.6 years ago by liz.marjory ▴ 10

0

Entering edit mode

Then simply follow the linked DESeq2 workflow (the second link). It includes everything you need.

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

Thanks, the DESeq2 workflow is very detailed But they start using Salmon. I can't do that because I don't have a previous reference transcriptome. Think that necessarily I have to pass by alignment step and start from my genome fasta file. My problem is about the count step, using HTSeq. I don't know how figure out the Warning message

ADD REPLY • link 4.6 years ago by liz.marjory ▴ 10

0

Entering edit mode

But you have a GTF, right? It should contain gene annotations. Can you show some lines from this GTF? There are other tools that can make a count matrix such as featureCounts. HTSeq is old, no need to stick with it. Lets first of all see if the GTF is fine.

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

I got the GTF file from rast annotation server

##gff-version 3
tig00000001 FIG CDS       190   300        .    +   1   ID=fig|83263.11.peg.1;Name=hypothetical protein
tig00000001 FIG CDS        892  1479    .   +   1   ID=fig|83263.11.peg.2;Name=hypothetical protein
tig00000001 FIG CDS       1839 1967 .   -   0   ID=fig|83263.11.peg.3;Name=hypothetical protein
tig00000001 FIG CDS       1971 2294 .   +   0   ID=fig|83263.11.peg.4;Name=hypothetical protein
tig00000001 FIG CDS       2322 2594 .   +   0   ID=fig|83263.11.peg.5;Name=Dipeptide transport system permease protein DppB (TC 3.A.1.5.2)

ADD REPLY • link 4.6 years ago by liz.marjory ▴ 10

score 0 · Answer 1 · 2020-04-25

0

Entering edit mode

4.6 years ago

ahmad mousavi ▴ 800

Check following links:

https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#mapping

https://www.melbournebioinformatics.org.au/tutorials/tutorials/rna_seq_dge_basic/rna_seq_basic_tutorial/

also see this paper,

https://www.nature.com/articles/nprot.2016.095

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

ADD COMMENT • link 4.6 years ago by ahmad mousavi ▴ 800

1

Entering edit mode

Both Tophat and Cufflinks is deprecated for years. Do not use this please.

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

I have changed to other one !!

ADD REPLY • link 4.6 years ago by ahmad mousavi ▴ 800

0

Entering edit mode

Thanks for the links!

I tried to use Galaxy but I couldnt upload the fq files (8 files 500MB each one). They seem to be to much heavy

ADD REPLY • link 4.6 years ago by liz.marjory ▴ 10