Aligning RNA-seq data against just a small set of transcripts of interest (and counting)
1
0
Entering edit mode
8.5 years ago

Would it be possible to align a whole RNA-seq against just a particular small set of transcripts, and not to a whole transcriptome? I.e., for example, just Hox genes, or just Wnt genes.

I am asking this because I am working with non-model organisms, with no genome or decent transcriptome available. After doing several attempts for a de novo assmebly transcriptomes, there was no way to get complete genes for most of those I am interested in, and way too many chimeric genes. But, after manually curating and tons of PCR I have now very reliable sequences for my set of transcripts. I would like to get expression level measures of this set of 43 genes in 8 developmental stages, and so although qPCRs are possible, I would rather try first to use the RNA-seq I have.

I thought on doing something similar to this: Create GFF from de novo assembly to input on htseq-counts

Align the RNA-seq datasets against the 43 genes (really low % of alignment expected), count the tags and calculate TPM myself. I just need the TPMs to then standardize (z-score) the data by gene.

Would that make sense?

Edit: edited title. I want to align and count

RNA-Seq alignment transcriptomics • 2.0k views
ADD COMMENT
0
Entering edit mode
8.5 years ago

I have the impression that you are mixing up two things. Do you want to align only against a small set of transcripts (as your title says) or do you want to perform counting only for a certain set (as indicated by the custom gff)? To what will you align if you don't have a reference genome available?

ADD COMMENT
0
Entering edit mode

Sorry if I didn't explain well. I want to allign and count.

So, I have a multi fasta of 43 genes whose sequence I have manually curated and now I want to have some measure of their expression levels at different developmental timings. What I plan to do is to align the RNA-seq data against those 43, let's say using bowtie, then count the reads aligned, using for example samtools, and then caluclate TPMs.

bowtie --> samtools --> TPM --> z-scores

The post I cite is just similar to my question, but I don't need the GTF. I was just citing it because the answer lead to something similar to my problem, but while their the whole transcriptome assembly is used, I wonder if using just a small set would be OK, since all methods I've seen align agains the whole transcriptome. As a matter of fact, for instance I used RSEM against just this set of 43 genes but obtained insanely high levels of expression (which are not true), so I was wondering if doing what I pretend is flawed somehow.

As for your last question, you can align directly against a transcriptome.

ADD REPLY
2
Entering edit mode

Your proposed method will lead to incorrectly high counts, since bowtie will produce more false positives due to having sequences from the whole transcriptome but only a few genes to align against. Use salmon or kallisto to get counts against the entire transcriptome and subset that to whatever you need.

ADD REPLY

Login before adding your answer.

Traffic: 2654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6