Differential Expression based on Sequences from lncRNA
1
0
Entering edit mode
9.7 years ago
Joel TM ▴ 60

Hi, first of all, thank you for existing ! I am somewhat new to bioinformatics but I've learned much in the last year. I am familiar with running RNAseq pipelines in order to get differentially expressed genes using DEseq, HTSeq, cufflink, EdgeR etc...BUT, I am facing something new; I have total RNAseq data and would like to see if some long non-coding RNAs are differentially expressed in my patients. I have their positions and their sequences. But they are not "identified" in databases so they're not part of the gene reference file.

My question is: is it possible at all to get differential expression based off of sequences?

Would I have to manually change my gene.gtf ?.. Any help would be welcome.

Thank you very much

J.

RNA-Seq sequence • 2.7k views
ADD COMMENT
1
Entering edit mode

You can create a bed file with lncRNA coordinates and count how many reads mapped to each lncRNA using bedtools multicov and perform DESeq/edgeR analysis.

ADD REPLY
0
Entering edit mode

Ok thank you, I'll be trying that asap.

[EDIT] Works like a charm. Thank you very much

ADD REPLY
1
Entering edit mode
9.7 years ago

You can create a bed file with lncRNA coordinates and count how many reads mapped to each lncRNA usingbedtools multicov and perform DESeq/edgeR analysis.

A small note: You should keep in mind that bedtools do not take care of paired-end data i.e it will count reads per region instead of fragments per region.

ADD COMMENT
0
Entering edit mode

Thanks for the detail! I read something about that too. I was too impatient so I tried it with the Tophat output (a single .bam file). The data we have is indeed paired ends though. I understand the results could be different from reality, but I don't know to what extent. Would utilizing only the forward strand be better ?

ADD REPLY
0
Entering edit mode

There would not be much difference. Its just that you need to keep in mind. Still you can create a dummy gtf file with your coordinates and use htseq-count. Something like:

chr1    source    lncRNA    100    200    lncRNA_id="chr1:100-200"
chr1    source    lncRNA    500    600    lncRNA_id="chr1:500-600"

Now try htseq-count with -i lncRNA_id.

If you know what exactly htseq is doing, you can create a dummy gff/gtf and use htseq-count to get fragment level counts.

ADD REPLY
0
Entering edit mode

That is good info, thank you for all your help ! :)

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6