Entering edit mode
8.3 years ago
Farbod
★
3.4k
Dear friends, Hi
I want to map my de novo transcriptome assembly to reference genome using BLAT or GMAP. Then, look at the distribution of intron lengths that can infer from those alignments.
The main story is this that the Trinity software needs a --genome_guided_max_intron parameter for its genome guided and its manual has suggested that "use a maximum intron length that makes most sense given your targeted organism"
So, I need your helps about the script(s) for mapping de novo assembly to genome: must I index the genome? must I install the BLAT same as local ncbi BLAST?
Thank you in advance
Hi, Like you said,make use of GMAP and map the transcriptome to your refernece geneome and later you will obtain the gff3 file which has the eixon location of transcript within the scaffold/contig. Make use of that information to compute the intron length.
The other option is to make use of tool called "GAG" where you have to provide the Genome(fasta file) and its GFF3 file(you can obtain from GMAP) and it will tell the summary stats of genome features including min intron length, max intron length and mean intron length
Dear Tom, Hi. Very nice answer, thank you.
I don't think the number has to be absolute. You can use a number that should fall in ballpark (one from zebrafish may be fine in this case).
Dear genomax2, Hi
I have used "10000" that is written in The trinity website and the result was only about 500 transcripts but in the de novo assembly I have more than 500,000 transcripts!
So I think that this number must be very critical or the zebrafish and my species are very very distinct from each other.
Do you have any idea that what is this number (intron lengths) for Zebrafish?
According to this paper that number may need to be ~1000 for zebrafish.
You have predictions for half a billion transcripts. There is no independent evidence that they are real, as yet.
Thank you for the paper you have provided, and the time you have spent.
I really appreciate that.
Dear Genomax2,
In the table1 of your paper, the "maximum intron size" is about 378,145 for zebrafish but you have siggested ~1000, is there any miss-understanding by me?
Mean intron length is ~3000 and median is ~1000 (378K is an outlier). You could try running with a couple of different values (1000 and 3000).
My Dear Friend, Genomax, Hi.
I have used the Trinity genome guided approach with different "maximum intron size"(s) and the number of genes or better to say, transcripts in the result fasta file was as below:
maximun intron size ....................................... No. of transcripts
378145 ..........................................................568
3000 ............................................................. 567
1000 ............................................................. 566
10000 ........................................................... 567
De novo assembly ......................................... ~ 600,000 transcripts !
Do you have any idea about these results?
my fish was a sturgeon and I have mapped its reads with zebrafish genome (using STAR) as there was not any close genome to my species.