Question

Deleted gene appears in RNA-seq analysis?

2

Entering edit mode

10.9 years ago

Parham ★ 1.6k

Hi,

I have RNAseq data of two strains. In the WT I can see top1 gene present with FPKM of average 90. the problem is in the mutated one where top1 is deleted I still see top1 with an FPKM of average 4. Well that's too less comparing to WT but I want to know if that's possible or my experiment went wrong at some point? Please let me know if you want more information.

RNA-seq Deleted-Gene • 6.0k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

How were the FPKM values calculated. If you're using multimappers then that could be the problem.

ADD REPLY • link 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

I use TopHat, Trapnel et al workflow. I guess TopHat uses multimappers as a fast search I did just now. Is that correct? And is that fine if I get it?

ADD REPLY • link 10.9 years ago by Parham ★ 1.6k

1

Entering edit mode

Yeah, it'd be good to recalculate things without any multimappers. Also, if there's high enough similarity between top1 and any of the other topoisomerases then their FPKM may be spilling over onto this one (again, due to multimappers).

ADD REPLY • link 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

I did a NCBI BLAST. The highest similarity was 13 exact base pairs and then 25 somewhat similar base pairs. Would that be enough for their FPKM to spill over each other?

ADD REPLY • link 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

Perhaps. From the IGV screenshot you posted, it looks like the entirety of the signal is coming from the 3' most end. What sort of edit distance and MAPQ do the reads that map their have? If they have a number of mismatches then it's likely that you're seeing them there due to how tophat works (i.e., it maps to the transcriptome first, so it can produce somewhat biased alignments in cases like this).

BTW, was the KO made with something like Cre-Lox such that there could be a little residual RNA floating around from prior to the excision event?

ADD REPLY • link 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

How do I look up these MAPQ and edit distance for mapped reads? Some help would be appreciated.

For KO we used a two step PCR method, since we work on S. pombe.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

You should be able to just samtools view accepted_hits.bam II:2941900-2942000 | less, or something like that with more appropriate coordinates.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

OK I did samtools view accepted_hits.bam II:2941980-2941734 | less > 1.txt and it produced a 2GB file! Is it supposed to be that big?! It includes data as:

HISEQ:114:C3N2PACXX:5:1310:16022:25263    272    AB325691    101    3    50M    *    0    0    ATGGTTTCAATTCTTGCTATGTTAATAAATTACACATAAAATTCACGGGT    JJJJJJJIJJJJJJJIJJJJJJJJJIJJJJIJHHJJJHHHHHFFDFFCCC    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78690    XS:A:-    HI:i:0
HISEQ:114:C3N2PACXX:5:1315:8531:43636    272    AB325691    103    3    50M    *    0    0    GGTTTCAATTCTTGCTATGTTAATAAATTACACATAAAATTCACGGGTAT    JJJJJJJIJJJJJIJIJJJJJJJJJJJJJHHHJIJJJHHHHHDFFFFCCC    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78692    XS:A:-    HI:i:0
HISEQ:114:C3N2PACXX:5:2102:7306:38903    0    AB325691    113    3    50M    *    0    0    CTTGCTATGTTAATAAATTACACATAAAATTCACGGGTATAATATTAATC    BCCFFFFFBHHHHJJJIIIJIIIIGJJIJJJJIIIGIHHIHGIIJJJIII    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78702    XS:A:+    HI:i:0
HISEQ:114:C3N2PACXX:5:1301:14352:21722    0    AB325691    164    3    50M    *    0    0    ACAAAGAGCGTAAACTTTTGTTACAGTTGTAATCATTAAGGTGATATGCT    CCCFFFFFGHHGGIIIJIJJHJDIHHJIJJIIJJJJJJJIGGHIJJIIII    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78753    XS:A:+    HI:i:0
HISEQ:114:C3N2PACXX:5:1301:14373:21754    256    AB325691    164    3    50M    *    0    0    ACAAAGAGCGTAAACTTTTGTTACAGTTGTAATCATTAAGGTGATATGCT    CCCFFFFFGHHHHJJJIJJJIJJJJJJJJJHIJIIJJJJJIFGHGIIJJJ    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78753    XS:A:+    HI:i:0
HISEQ:114:C3N2PACXX:5:1109:4473:58639    16    AB325691    315    3    50M    *    0    0    GGTATTTAGATATTCTACATTAGTAAGCGAAAGTCAATTATATCTCATCT    JJJJJJJIJJJJJJJJIJJJJJJJJIJJJJJJJJJJJHHHGHFFFFFCCC    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78904    XS:A:-    HI:i:0
HISEQ:114:C3N2PACXX:5:1213:4358:94861    16    AB325691    315    3    50M    *    0    0    GGTATTTAGATATTCTACATTAGTAAGCGAAAGTCAATTATATCTCATCT    JJJJJJIIJJJIIIHGGIJJJJJJIHIIJJJJJJJJJHGHHHFFFFFCCC    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    NH:i:2    CC:Z:II    CP:i:78904    XS:A:-    HI:i:0

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

That seems wrong, though I'm surprised that anything was written to 1.txt. Try instead

samtools view accepted_hits.bam II:2941980-2941734 > 1.txt

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

Still a 2.5 GB txt file. How can it be that huge?! Any suggestion?

One more question, how do you run TopHat without multimappers? I couldn't find out by looking at the options. Thanks!

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

That does seem large, it would appear that something went wrong. You should be able to discern what from the file's contents.

You can't exclude multimappers, you just filter them afterward.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

Devon I don't know what you mean by "You should be able to discern what from the file's contents." Please explain more.

And would be very kind of you if you could show me how to remove multimappers from my data. Would it be and advantage for the rest of analysis? I mean would it be more accurate then?

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

You're trying to extract alignments that cover a specific region, so if the ones in the file don't cover that region then something went amiss. Given the size of that file, my guess would be that its contents aren't really what you want.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

Ram · Answer 1 · 2014-07-22

1

Entering edit mode

10.9 years ago

Vivek ★ 2.7k

Was the whole gene deleted or were a few exons knocked out? That might explain the low FPKM. I'd look at the alignments in a viewer like IGV to see what the mapping looks like. Otherwise these could just be low map quality reads aligning there by error.

ADD COMMENT • link 10.9 years ago by Vivek ★ 2.7k

0

Entering edit mode

Doesnt have to be by error, if the region has similar sequence to a non-deleted region. Particularly since Tophat looks for partial matches with indels and splicing, you'll find plenty of reads fit in plenty of places.

ADD REPLY • link 10.9 years ago by karl.stamm 4.1k

0

Entering edit mode

I might be wrong here but I don't think tophat aligns across Indels or partial alignments, that was one of the arguments used in favor of other rna-seq aligners like star, g-snap etc

ADD REPLY • link 10.9 years ago by Vivek ★ 2.7k

0

Entering edit mode

Thanks for suggestion, but first of all I get an error for opening IGV in the console. Do you know what is that?

log4j:ERROR Could not find value for key log4j.appender.R
log4j:ERROR Could not instantiate appender named "R"

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k

0

Entering edit mode

That looks like an X11 error if you are running it from a HPC cluster. Try installing IGV on your local machine, its a java package, you shouldn't have any issues with it that way.

ADD REPLY • link 10.9 years ago by Vivek ★ 2.7k

0

Entering edit mode

Ok, I did check and there are some mapped sequences in the deleted experiment. I have an screen shot of IGV here. What do you think about it? And by the way YES the whole gene is deleted.

ADD REPLY • link 10.9 years ago by Parham ★ 1.6k

1

Entering edit mode

If the top BAM is the knockout of top1 gene and the bottom is the wildtype, it looks pretty good to me. The reads mapping to the last exon in the knockout sample do not differ much from the reads mapping into the intergenic region.

ADD REPLY • link 10.9 years ago by Vivek ★ 2.7k

0

Entering edit mode

Yes the bottom is wildtype! OK then thanks for checking it.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Parham ★ 1.6k