RNA-seq alignment against globin genes (HBA1, HBA1, and HBB)
1
1
Entering edit mode
5.9 years ago

Hi,

We have generated a set of RNA-seq samples from blood tissue which are non globin depleted. I want to perform the RNA-seq alignment against a set of highly abundant globin genes (HBA1, HBA2, and HBB) and identity the percentage of globin reads mapping to these genes and exclude it from the analysis. After this, extract the unaligned reads and map to the hg19 genome and perform quantification to identify number of genes or transcripts. Please let me know most appropriate tool and methods to perform.

RNA-Seq alignment • 2.6k views
ADD COMMENT
1
Entering edit mode

Why not just do the alignment / pseudo alignment as standard and then check the expression over these genes, as I did here (using transformed, normalised counts):

c m v

These are different studies. The third study we knew had a mixture of globin- and non-globin depleted samples. The ones in red were the ones that were suspected as being non-globin depleted.

I feel that, by actually aligning, filtering out reads, and then re-aligning, you will be introducing bias into your data.

Edit: to complete my comment: after you do this, you can selectively exclude the raw count data from the globin genes prior to normalisation. There are likely many ways of dealing with this issue, though.

ADD REPLY
0
Entering edit mode

Thank you Kevin. This was helpful.

From the provided comments, I understand that, I need to first align my non globin depleted samples against whole genome hg19 build. Post this, perform quantification and obtain the expression of these genes. Could you please provide any material or publication related to this. Thank you.

ADD REPLY
2
Entering edit mode

A good overview of all steps involved in a typical RNA-seq analysis can be found at bioconductor, for example Michael Love's tutorial

ADD REPLY
0
Entering edit mode

Thank you all for the comments. It was helpful.

ADD REPLY
1
Entering edit mode

mohammedtoufiq91 : If a specific comment in this thread was helpful in solving your question let us know and we can move it to an answer so you can accept it.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode
5.9 years ago
michael.ante ★ 3.9k

̶Y̶o̶u̶ ̶c̶a̶n̶ ̶m̶a̶p̶ ̶t̶h̶e̶ ̶r̶e̶a̶d̶s̶ ̶f̶i̶r̶s̶t̶ ̶a̶g̶a̶i̶n̶s̶t̶ ̶t̶h̶e̶ ̶g̶l̶o̶b̶i̶n̶ ̶g̶e̶n̶e̶s̶ ̶a̶n̶d̶ ̶s̶t̶o̶r̶e̶ ̶t̶h̶e̶ ̶u̶n̶m̶a̶p̶p̶e̶d̶ ̶r̶e̶a̶d̶s̶ ̶a̶s̶ ̶f̶a̶s̶t̶q̶-̶f̶i̶l̶e̶s̶.̶ ̶B̶o̶w̶t̶i̶e̶2̶ ̶a̶n̶d̶ ̶b̶b̶m̶a̶p̶ ̶c̶a̶n̶ ̶d̶o̶ ̶t̶h̶a̶t̶ ̶e̶a̶s̶i̶l̶y̶.̶ ̶ ̶T̶h̶e̶ ̶a̶l̶i̶g̶n̶m̶e̶n̶t̶-̶r̶a̶t̶e̶ ̶i̶n̶ ̶t̶h̶i̶s̶ ̶f̶i̶r̶s̶t̶ ̶s̶t̶e̶p̶ ̶w̶o̶u̶l̶d̶ ̶b̶e̶ ̶y̶o̶u̶r̶ ̶e̶q̶u̶a̶l̶ ̶t̶o̶ ̶y̶o̶u̶r̶ ̶g̶l̶o̶b̶i̶n̶ ̶c̶o̶n̶t̶e̶n̶t̶.̶

̶T̶h̶e̶ ̶u̶n̶m̶a̶p̶p̶e̶d̶ ̶r̶e̶a̶d̶s̶ ̶c̶a̶n̶ ̶b̶e̶ ̶u̶s̶e̶d̶ ̶a̶s̶ ̶i̶n̶p̶u̶t̶ ̶f̶o̶r̶ ̶S̶T̶A̶R̶ ̶o̶r̶ ̶H̶I̶S̶A̶T̶2̶.̶

̶I̶n̶ ̶c̶a̶s̶e̶ ̶y̶o̶u̶ ̶o̶n̶l̶y̶ ̶h̶a̶v̶e̶ ̶u̶n̶d̶e̶p̶l̶e̶t̶e̶d̶ ̶b̶l̶o̶o̶d̶ ̶s̶a̶m̶p̶l̶e̶s̶,̶ ̶I̶'̶v̶e̶ ̶t̶o̶ ̶d̶i̶s̶a̶g̶r̶e̶e̶ ̶w̶i̶t̶h̶ ̶K̶e̶v̶i̶n̶'̶s̶ ̶c̶o̶m̶m̶e̶n̶t̶.̶ ̶T̶h̶e̶ ̶g̶l̶o̶b̶i̶n̶-̶c̶o̶n̶t̶e̶n̶t̶ ̶i̶t̶s̶e̶l̶f̶ ̶i̶s̶ ̶a̶ ̶m̶a̶j̶o̶r̶ ̶b̶i̶a̶s̶ ̶w̶h̶i̶c̶h̶ ̶i̶n̶f̶l̶u̶e̶n̶c̶e̶s̶ ̶t̶h̶e̶ ̶g̶e̶n̶e̶ ̶d̶e̶t̶e̶c̶t̶i̶o̶n̶ ̶o̶f̶ ̶n̶o̶n̶-̶g̶l̶o̶b̶i̶n̶ ̶g̶e̶n̶e̶s̶.̶ ̶ ̶I̶n̶ ̶c̶a̶s̶e̶ ̶y̶o̶u̶ ̶h̶a̶v̶e̶ ̶a̶ ̶m̶i̶x̶t̶u̶r̶e̶ ̶o̶f̶ ̶d̶e̶p̶l̶e̶t̶e̶d̶ ̶a̶n̶d̶ ̶n̶o̶n̶-̶d̶e̶p̶l̶e̶t̶e̶d̶ ̶s̶a̶m̶p̶l̶e̶s̶,̶ ̶I̶'̶d̶ ̶f̶o̶l̶l̶o̶w̶ ̶K̶e̶v̶i̶n̶'̶s̶ ̶a̶p̶p̶r̶o̶a̶c̶h̶.̶

[Edited because of the discussion].

Since you also asked for appropriate tools:

You can map RNA-Seq reads with STAR or HISAT2 against the genome. For both methods it's good to include the gene annotation in the index generation. For gene-counting you can use FeatureCounts.

Cheers,

Michael

ADD COMMENT
4
Entering edit mode

n case you only have undepleted blood samples, I've to disagree with Kevin's comment. The globin-content itself is a major bias which influences the gene detection of non-globin genes.

Wouldn't it then make most sense to align against the genome, and remove the globin genes from the count table prior to differential expression? It is commonly not recommended to align against a subset of the genome, as the aligner will try to find the best spot for a read, which is not necessarily the correct spot. I don't know about globin genes, but reads from similar genes could end up aligning to the globins, while in an alignment to the genome they would have been aligned to the correct gene.

ADD REPLY
1
Entering edit mode

Wouldn't it then make most sense to align against the genome, and remove the globin genes from the count table prior to differential expression?

Yes, that is what we did, specifically to avoid bias in alignment

ADD REPLY
0
Entering edit mode

Hmm You have a point there, especially regarding the mapping of similar reads. My suggestion was inspired by prior rRNA-removal which interferes more with the alignment, than non-depleted blood samples might cause.

ADD REPLY
0
Entering edit mode

Probably many ways to do this, I imagine, and most more or less ending up with similar results. What researchers actually do is likely buried in supplementary methods, too. Any logical approach should be fine, I think.

ADD REPLY
0
Entering edit mode

When has the strike-through button disappeared?

ADD REPLY
0
Entering edit mode

Will ask. I think that you can just wrap with <strike> </strike> tags

striked out text

ADD REPLY
0
Entering edit mode

I used this for now.

ADD REPLY

Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6