I have a large Illumina RNA-Seq dataset, and I have already mapped it to the reference genome using STAR and done quantification. But now I want to look at expression of GFP which is not native to the species (as this is a transgenic mouse).
I imagine the 'proper' way to do this is to create a new reference genome with the GFP gene added as an extra chromosome. But this would then require a lot of duplicated work, space, and time.
What I tried to do is create a new reference index with the single GFP gene, and then align against that, but STAR creates a 1.5GB index for this single gene, and what if I want to do this with more genes? This seems to using STAR outside the type of work it was originally designed for. Or is this in fact the correct approach?
EDIT:
Am I missing anything obvious here, like using BLAST or BLAT (I don't have any experience with these older tools)? Thanks.
Is GFP fused to something or is it being expressed by itself? You might just try bowtie2 or bwa, which should have smaller indexes and be fast enough for your purposes.
BTW, do you have the unmapped reads (this is an option for STAR)?
Expressed by itself. Does that make a difference? And no, I didn't save the unmapped reads from the original mapping.
Only in that if it were fused to something else then you might get somewhat better results by putting the fusion protein in. Otherwise, no, that doesn't matter too much. Too bad you didn't save the unmapped reads, that would have made life simple :)
Wouldn't that affect the alignment rate, so the counts from the native genes wouldn't be comparable to the GFP counts?
Hi,
I have a similar question, I have a TE fasta file (that I got from bedtools) looking like that:
How can I index this 'genome' with STAR?
I would like to map reads on that. The TEs are in the original complete fasta file, maybe finding them out after mapping on the whole genome is a better way?
Cheers,
Mathieu
Please post things like this as new questions.
I would recommend that you do the following:
Doing it that way will produce fewer false positives and a higher overall alignment rate.
Why All The Capitals haha ;-) ?