Question

counting ERCCs spike-ins in RNAseq data

0

Entering edit mode

8.3 years ago

alirezamomeni707 • 0

I have used ERCC spike-in in my RNAseq data. I have aligned my data and now I have bam files. to count the reads per gene I used htseq-count(which needs gtf file). I also have to count ERCC (I have 98 spike-in). I have fasta file of ERCCs. do you know how I can count the ERCCs?

RNA-Seq • 4.3k views

ADD COMMENT • link updated 8.3 years ago by Charles Plessy ★ 2.9k • written 8.3 years ago by alirezamomeni707 • 0

0

Entering edit mode

Ideally you could have appended the ERCC fasta to the genome and then aligned your data. Since you have not done that you would need to create a new "genome" (and a GTF file to go with it) and then align/count.

ADD REPLY • link 8.3 years ago by GenoMax 154k

0

Entering edit mode

is it not possible to align to ERCC fasta and it's GTF post alignment (using aligned bam)? (aligned bam = aligned with reference fasta other than ERCC)

ADD REPLY • link 8.3 years ago by cpad0112 21k

1

Entering edit mode

You can filter them out and quantify them with BBMap's Seal using the aligned bam.

seal.sh in=aligned.bam ref=ERCC.fa out=filtered.bam stats=stats.txt k=31

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Since ERCC sequences should be totally diverse it should not matter what you use.

ADD REPLY • link 8.3 years ago by GenoMax 154k

0

Entering edit mode

thanks. actually I have aligned to ERCC (made index from fasta file). but I do not have GTF file for that. actually this is the main problem

ADD REPLY • link 8.3 years ago by alirezamomeni707 • 0

0

Entering edit mode

You make one up yourself.

ADD REPLY • link 8.3 years ago by GenoMax 154k

score 0 · Answer 1 · 2017-08-05

You can filter out and count the spike sequences (and rRNA, and linker) before alingment with TagDust 2. In the following document on GitHub, I used it to detect ArrayControl spikes, but it also work with ERCC ones. For maximal accuracy, make sure to use the translated sequences available from the NIST, and not the plasmid insert sequences (see Patch ERCC spike sequences to get their real 5-prime ends. for the long story). In any case, if you use the TagDust approach, make sure your sequences do not contain common parts such as polyA tails.