SRA: realistic data from reference transcript
1
0
Entering edit mode
2.2 years ago
shinyjj ▴ 50

Hi all,

Is there a way to download a realistic rna-seq data generated from the reference transcript (https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml#:~:text=gff3-,RefSeq%20Transcripts,-Fasta) in SRA? If so, is there an accession number?

reference SRA transcript • 973 views
ADD COMMENT
0
Entering edit mode
2.2 years ago
GenoMax 148k

a realistic rna-seq data generated from the reference transcript

RNAseq data is not generated from reference transcipts. Reference transcripts have been built over time from sequence data that has been accumulated over last two decades.

Any dataset (that is not simulated) represents real data in SRA. Quality of experiments may be variable but that is par for course. What do you intend to use the data for?

If you want a lot of samples then dataset used by SEQC consortium is here: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=study&acc=SRP025982

ADD COMMENT
0
Entering edit mode

Hello Genomax, thank you for replying. What I want to do is generate realistic RNA-seq data from the reference transcript.

I tried using polyester to generate realistic RNA-seq data from the reference transcript. It would be nice for this realistic rna-seq to be illumina profiled, introducing hexamer bias, positional bias, and SNP (indels). If there is a better tool than polyester to generate rna-seq from reference transcript, that would be great too. I am not just sure how many parameters should I include in the reference transcript to generate realistic enough rna-seq data. There is another tool like ART, but ART is a pretty old package, almost a decade ago.

I originally thought if there is realistic RNA-seq generated from reference transcript, that would be great.

ADD REPLY
0
Entering edit mode

I see. You are looking to find simulated data. I don't think there is going to be much (if any) in SRA. You can try searching with https://sra-explorer.info/

Not sure what you need is but at this point there is so much real data out there that people don't need to simulate. That is why there are no new tools.

ADD REPLY
0
Entering edit mode

I would like to assess an accuracy in a situation where the true expression levels are known, so I was using polyester to generate synthetic data sets from reference transcript. Are you familiar with polyester to generate realistic rna-seq? I am asking in this in case.

ADD REPLY

Login before adding your answer.

Traffic: 1480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6