"transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat" was done by orginal author.Please help me to get that 2013 transcriptome and gff data
1
0
Entering edit mode
2.7 years ago

Greetings all,

I have Rna-Seq data generated from celseq protocol.I want to replicate the same trancriptome mapping as previous study. As they used RefSeq transcriptome (downloaded March 2013) for mapping.so, I am in a necessity to do the same.

Please guide me how to get the same RefSeq transcriptome version and corresponding annotation

Thanks in advance

RNA-Seq RefSeq • 1.8k views
ADD COMMENT
1
Entering edit mode

Take a look at: http://ftp.ensembl.org/pub/ Probably around release-70 or so?

Check the dates of the files inside the release folders to check.

ADD REPLY
0
Entering edit mode

Thanks for your reply.But they actually tried with hg19 built RefSeq data.So i cannot use ensembl to replicate the same

ADD REPLY
1
Entering edit mode

Archival versions of RefSeq data are not available. So in theory this exercise is not possible.

That said RefSeq transcriptome is a vague term since you sure don't want to align the data against entire RefSeq. Is this for a specific genome?

ADD REPLY
0
Entering edit mode

Its human data,They mentioned build genome is hg19 built alone.

ADD REPLY
2
Entering edit mode
2.7 years ago

there is no need to go back to an early transcriptome - you should replicate the process with the most up to date reference data

and with the most up to date tools, TopHat is not an acceptable software anymore

the most common misconception in bioinformatics is that one should use early versions of software if that is how the original authors did it,

that is just a misunderstanding of what replication means, do your best to replicate the findings with the methods that produce the same insights.

ADD COMMENT
0
Entering edit mode

Yeah i agree that Istvan.But I am doing it to confirm my analysis is correct regarding umi collapsing Let me elabrote you I have umis in my single cell rna-seq data .After deduplication i coulnt get same amount of genes and reads as the original study. I actually used latest primary assembly GRC38 from ensembl as my reference and alligned with STAR but I only could cover 58.7% of the previous genes and umi reads too lesser than theirs.

I use umi-tools for deduplication,But they used other tool but they didnt mentioned.

ADD REPLY
1
Entering edit mode

here is the conundrum,

you get more cooperation from the authors you if you leave them with plausible deniability that it "must have been the transcriptome" that lead to different results

the most likely explanation is that they did some things things incorrectly - while using an approach that by today is considered inappropriate anyway

usually I would not lose too much sleep, I would check a few clear-cut examples: for example recently in a paper we redid the alignments and a gene that was claimed to express differentially and was part of the main story turned out to have no coverage at all in any sample! We double-checked the alignments and processes - our conclusion is that the entire paper (Nature publication FWIW) is suspect, authors moved on with their careers or retired.

Now what? I think that paper is not worth spending any effort on, checking anything else whatsoever - other than learning perhaps how to sell a paper :-/

ADD REPLY
0
Entering edit mode

I got more relief from your words :-)

Again one more thing they did was after transcriptome allignment with tophat they alligned the unmapped read to genome with bowtie and GSNAP.So will that be the reason for more number of genes and reads

I also tried to replicate their whole flow like first mapped the reads with bowtie to genome and unmapped with STAR.But still no changes happened.then only i finally decided to try their whole process from the start with tophat even with same trancriptome refseq

Thanking you in advance

ADD REPLY
1
Entering edit mode

I would suggest reproducing the main findings and outcomes with the newest, more accurate, and simpler methods.

The numbers individually will not match - perhaps even radically so - the main findings may or may not still be the same. If I were a betting man, I'd say perhaps half of the findings will validate.

ADD REPLY
0
Entering edit mode

Thanking you lot for all your insights Istvan

ADD REPLY
1
Entering edit mode

Again one more thing they did was after transcriptome allignment with tophat they alligned the unmapped read to genome with bowtie and GSNAP. So will that be the reason for more number of genes and reads

That would have likely identified new isoforms or transcripts that were not in the original transcriptome. How many of those were real and are still around is a question though.

Current human transcriptome is likely in a much better shape than it was in 2013 so if you have results from current dataset then you should stick with those.

ADD REPLY
1
Entering edit mode

Re-reading it suddenly seems odd that they would use TopHat to align against a transcriptome. Frankly, that seems like a misuse of the tool.

TopHat is a splice-aware aligner that works on a genome and not a transcriptome, it is not able to redistribute reads to the correct location. TopHat can take a transcriptome annotation file to assist with detecting splices but not to drive the alignment process.

To build a new transcriptome they should have assembled the TopHat aligned reads

ADD REPLY
0
Entering edit mode

Again Thanks a lot Istvan for your valuable insights

ADD REPLY
0
Entering edit mode

Thanks a lot GenoMax for your valuable inference

ADD REPLY

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6