Dear Friends, Hi. ( I'm not native in English so, be ready for some possible language flaws).
As you know, there are about 481 ultra-conserved elements that are very similar among Human, rat, mouse and some fishes (http://www.ncbi.nlm.nih.gov/pubmed/15131266 ).
I have the RNA-seq data of a non-model vertebrate (so, there is no reference genome for it) and de novo transcriptome assembly of it and I want to check
1- first, if these ultra conserved 200 bp elements is exist in this species, too.
2- and secondly, what is the percentage of the similarity of the related sequence of my species to the sequence of each element in human
Can the RNA-seq data (and de novo assembly) could be used for these purposes ? How?
Thank you in advance
You would not be able to get the ones that are located in the intron (and a cursory glance at the paper seems to have some). You can align your data to genomes of those species but be ready for a lot of leg work.
It depends on what you want to achieve (you will find something since the original paper does say that they found conservation to some extent with fish) and if you have the time to invest in this.
Hi Dear genomax2,
I want to check their existence in my species and degree of conservation of them.
I am not being rude but a valid question is then what will you do next?
We know that based on the paper there are going to be some but their significance would be harder to tackle in your genome since all you have to go on is the transcriptome.
When you extract the sequences from that paper remember that the coordinates are going to be from 2004. The ECR browser may be a better option if you can get at the data directly without having to deal with that browser UI.
If you click on the
ECR
link at the top left you can get a new window where you will get the sequences of the ECR's.Base Genome
allows you to select the genome you are looking at.Dear genomax2,
maybe I did not get the point correctly but first I want to check the existence of these genes in my species as it is very
older in the evolutionary perspective than other fishes and vertebrates that have been used in this paper (and other
similar papers)
then if I find some exact matches, as I am using the transcript data, they are the ultra-conserved genes (I will miss the
introns and may be they are not so important for me).
if i find some similar but with some SNP or mismatches then may be I can invest about the cause of such mismatches,.
is it what you have asked in "a valid question is then what will you do? " ? correct my if my assumption is not correct, please
All that is great. Sounds like you have funding to do some basic research without having to justify the end first :-)
This page lists the conserved elements from the paper you originally linked above. They are referring to hg16 genome build but you should be able to get the sequence and lift it Over to current assembly, if needed.
Dear genomax2,
thank you for your valuable times and specially for your ultra-helpful link !
Dear genomax2, Hi
I have used the sequences you have provided, from here and blastn them against my transcriptome assembly,
it has about 81 hits (from 481 ultra-conserved-elements), and interestingly it is about 19 "non-exonic" elements among them!
Do you have any explanation in this regard ?
Thanks
How was your RNA-seq performed? Ribodepletion or poly-A selection? If the former, it might be that some nascent RNA (unprocessed/unspliced) is present, containing introns.
Alternatively, it's also possible that elements which were annotated as "non-exonic" actually are coding but not properly annotated as such. You should have a look what's there, perhaps long non-coding RNAs.
Dear WouterDeCoster, Hi (nice picture you have for your profile!)
It was from Illumina hiseq200o for about 3 years ago and I think it was "poly-A selection".
can you suggest any web-sites that I can check my transcriptome selected sequences in this regard ?
it is one of my sequences that shows hit with non-exonic UCEs.
I didn't really need feedback on my profile pic, but thanks, I guess.
I checked the fragment you provided in the mouse and human genome and there is no trace of anything coding. But there are indeed blocks of conserved sequences. Perhaps you are onto something new ;)
Was your RNA treated with DNase or is genomic contamination possible? It's important to know the background of the library prep if you are working on the data, years later.
" Perhaps you are onto something new ;) " was very valuable for me !
yes we have used DNase treatment, and "Truseq kit" was used for cDNA library construction.
and this is the e-value of blast hit : 1.14e-136
What was non-exonic in hg16 build could have changed since (did you check in current assembly if those sequences are still non-exonic)?
Hi, no!
As I just have the sequence of the UCE from the link you have provided (original seq) and the sequence of my fish transcriptome that shows blastn hits with the UCE sequences.
and, I do not know how to check that the old UCS seq is changed to exonic in new Human genome.
this is the original UCE seq for a non-exonic example I have provided previously:
If you check it, please kindly teach me, too. Thanks
This UC still appears to be non-exonic (intergenic) and highly conserved in many things (including zebrafish). I am not sure if you can see this link. It may last for a few days.
Reason I brought that up was the sequence you posted above had a trininty ID and I thought that you had pulled out a sequence from your transcriptome using a non-exonic human sequence.
Your data could have some trace contamination of DNA. If you ever aligned your own data to zebrafish you may be able to see the reads that hit this UCE.
Hi genomax2,
I have posted both my "Trinity transcriptome sequence" and "UCE original sequence" of non-exonic
element for you.
I have not align my data to zebrafish genome yet, but I think it is possible to just align this
"TRINITY_DN76988" sequence to zebrafish genome and check that what is what ?
am I right ?
Here is the alignment of the TRINITY piece to Zebrafish genome. It is in intergenic/non-exonic region. Zoom out to get a broader view.
This link has both the UCE and trinity piece. The hits overlap.
Edit: The links have expired.
Based on that it's intronic, not impossible that's an alternative exon. Only lab work can tell us what is really going on.
I really appreciate all the times and efforts you have spent for me :)
So, there is a sequence that is non-exonic (it is intronic) in human and zebrafish BUT it is
present in my RNA-seq assembly (so it is an expressed mRNA = transcript),
What hypothesis can we offer in this regard (without lab work, of course) ?
(My species is evolutionary very older than zebrafish)
Could be genmoic contamination, could be a gene that was lost in evolution, could be an alternative exon very rarely present or only in a specific tissue type.
What do you mean by genomic contamination?
1- the contamination of the DNA of the fish individuals, itself ?
2- or, genomic contamination of the human that prepare the samples and libraries ?
There may still be a bit of DNA contamination (hopefully from your fish and not humans) left in your RNA prep that went into the library.
This is where you can go back to your alignments of original reads to the transcriptome you built and check how many reads support/align to this TRINITY transcript. If there are a lot then ...
They are also present in Fugu and Minke Whale (to some extent). Unless the UCE has a known function this is an observation (without any specific hypothesis).
Yes! because I guess that these Ultra-conserved-elements are conserved in all
vertebrates (and maybe invertebrates) so they are also present in Fugu and Minke Whale.
And, this situation that this non-exonic element is present in the transcriptomic data has not anything important in it ?