Dear Biostars Friends, Hi ( I'm not native in English so, be ready for some possible language flaws).
Is there any way to find out if a gene is duplicated in a non model animal from its RNA-seq data ?
e.g : the SRGAP2 gene has 3 copies in human, I want to check out how many of them is exist in a non model fish (yes, it means there is no genome available and the zebrafish is a fare related) that I have its transcriptome de novo assemby created using Trinity software.
Thank you in advance.
A general comment. Based on your many posts here if you are truly interested in this genome then why don't you do some additional WGS to take steps towards getting a real genome put together for this fish?
Hi my dear friend,
because it is expensive !
I never said anything about the cost :-)
But look at it this way. You can slice and dice the RNAseq data you have in hand only so many ways. At some point without additional real WGS it is not going to be possible to get a clear picture of what this genome actually looks like and what transcripts/genes are real.
Yes, You are 100% correct, but still the only barrier here for me is the COST :)
Your PI needs to at some point to decide how much time is being spent doing things the long/hard way when having a roughly assembled genome would make things easier. It's an expenditure of resources either way.
This is one of invoices I have recieved :
< de novo whole genome sequencing of fish> Sample #: 2 samples (male and female) Sample: Fish (1.9G genome size) Suggested depth: 50X; 95Gb/sample (1) Approach 1 – random fragmentation library construction Library Construction: Truseq PCR-free 350bp $100/sample; $200/2 samples Hiseq2000 100bp Paired End $3000/lane 3 lanes required Throughput: ~35G/lane; ~100Gb/3 lanes * 2 samples Sequencing price: $3000 6 lanes = $18,000 Subtotal: $18,200(USD)
(2) Approach 2 – Mate pair library construction Library Construction: 3kb, 5kb, 8kb Mate pair(MP) library $1000/each MP; $6000/6 MP for 2 samples Hiseq2000 100bp Paired End $3000/lane Targeted throughput: ~50G/each MP Hiseq2000 100bp Paired End setting $30,000/(50G*6MP=300G throughput) Subtotal: $36,000(USD)
The grand total: $54,200(USD)
Just wait a bit until some labs (including mine) have their PromethION up and running ;-)
Is your organism of any biological/ecological/industrial/... importance? That would help to raise some money to get it done...
Hi and thank you for your kind invitation,
I long to visit you in your Lab.
No need to visit. Just send the sample and back comes the sequence in a few days.
Something else may come along before PromethION becomes real.
Ha Ha Ha,
I was accepting Wouter suggestion as a unique post-doc opportunity !
and by the way, do you have any idea about the "TransDecoder, capture all the resulting proteins, non-redundify them, then re-cluster . . . " I have provided below ?
I think you are missing a link for that tool below.
Genia is also an interesting platform to keep an eye on. I'm not sure how far their development is, but the first PromethIONs are being shipped.
Ghe, I'm a PhD student so I'm not really in the position to invite people for a post doc!
Most likely obvious to you, but I would say it's better to say that the gene has 6 copies because humans are diploid :-)