Question

Question about RNA-Seq data alignment

0

Entering edit mode

2.6 years ago

mohammedtoufiq91 ▴ 270

Hi,

I have a question about genome alignment. I am working with RNA-Seq dataset to study the impact of Liquid Culture in response to virus of different doses in Human. I was exploring what could be the good strategy or best in practice method for genome mapping.

Maybe; Identity the reads mapping to virus and exclude it from the analysis. After this, extract the unaligned reads and map against hg38 genome and perform quantification to count the genes followed by downstream analysis in either edgeR or Deseq2.

Additionally, I was thinking about the below scenarios:

Why not just do the alignment as standard against hg38 using HISAT or STAR aligner > then quantify using RSEM or FeatureCounts.

(OR)

Map reads first against virus genome in question using Bowtie/Bowtie2 and store the unmapped reads as fastq files, then use these unmapped fastq file in HISAT OR STAR to align against hg38> Quantify

(OR)

I was just reading about BBMAP BBSplit. Use this?

Thank you very much for the help.

Toufiq

STAR Alignment BBMAP RNA-Seq Bowtie • 2.7k views

ADD COMMENT • link 2.6 years ago by mohammedtoufiq91 ▴ 270

0

Entering edit mode

I am working with RNA-Seq dataset to study the impact of Liquid Culture in response to virus of different doses in Human.

Are the viral transcripts ending up in the final dataset? If so you may want to see if there is any correlation with the initial dosing. So while you could simply split and remove viral reads, doing what ATPoint suggest may be the way to go.

ADD REPLY • link 2.6 years ago by GenoMax 153k

score 2 · Answer 1 · 2023-02-09

Maybe; Identity the reads mapping to virus and exclude it from the analysis. After this, extract the unaligned reads and map against hg38 genome

Not like this, As ATPoint said, you want to make a combined virus + human genome, and align to that. You do not want to ever align to a partial reference; you do not want the aligner to force reads to align to one thing when they would align better to something else. A combined reference will make sure that all the reads end up aligned to where they really belong.

RSEM is smarter about read counting than FeatureCounts, so all things being equal, use that. STAR's transcriptome output was designed to be compatible with RSEM, so I'd use that for aligning.

score 2 · Answer 2 · 2023-02-09

2

Entering edit mode

2.6 years ago

Rajendra KC ▴ 20

I feel you could use salmon and put the viral genome as decoy.

ADD COMMENT • link 2.6 years ago by Rajendra KC ▴ 20

1

Entering edit mode

Rajendra KC Thank you for suggestions. Yes, I have this pipeline too. I have to first explore how to put the viral genome as decoy in the human genome. I have not done this earlier.

ADD REPLY • link 2.6 years ago by mohammedtoufiq91 ▴ 270

ATpoint · Answer 3 · 2023-02-09

1

Entering edit mode

2.6 years ago

ATpoint 89k

I don't see why standard analysis would not work. Just include the viral genome into the hg38 fasta file, index with STAR as usual, align and quantify. That will decoy any viral reads while accurately aligning human reads against the genome.

ADD COMMENT • link 2.6 years ago by ATpoint 89k

0

Entering edit mode

Thank you for the suggestions ATpoint I will try this way. Have you come across any example/resource on how to build combined index (viral genome into hg38) using STAR? This will be helpful to me in building.

ADD REPLY • link 2.6 years ago by mohammedtoufiq91 ▴ 270

2

Entering edit mode

You can just cat the reference fastas together. You can just cat the gtfs together (except for the header). Then rebuild the index.

ADD REPLY • link updated 2.6 years ago by ATpoint 89k • written 2.6 years ago by swbarnes2 15k

0

Entering edit mode

swbarnes2 and ATpoint thank you very much. I will indeed try as suggested. I was earlier using the STAR hg38 index built by the collaborator. I will explore how to construct one which combined genome.

ADD REPLY • link 2.6 years ago by mohammedtoufiq91 ▴ 270

0

Entering edit mode

swbarnes2, ATpoint, and GenoMax. I figured out how to cat the fasta files together of both Human and Virus. The question now arises is about cat on gtf files. There's no gtf file for the viral organism I am working on. There are only fasta files for this virus. How to deal with this? Thank you.

Ps; I am using reference and gtf of Human from NCBI

ADD REPLY • link 2.6 years ago by mohammedtoufiq91 ▴ 270