Hi,
I want to calculate the mean of intron length of each gene in mouse(mm10) or human(hg19 or hg38).
The rationale of doing this is trying to categorize genes into 4 or 5 bins based on their intron length.
As the previously published paper "https://www.ncbi.nlm.nih.gov/pubmed/21358643", in Table 1, they classified genes into 4 groups depending on their intron length(0-1kb, 1-10kb,10-100kb,>100kb) and count the gene number in each class. Could anyone give me any suggestion to do this?
Thank you for reading my message.
If there is anything I should mention but I don't, please let me know.
I am pretty much rooky in bioinformatic analysis, I ready to take any suggestion and opinion.
Thank for you two give me such good solution.
Give me some time, I will try it both.
I am not familiar with JAVA script and perl but I will try my best.
Thank you all again, I even did not expect such fast respond.
This will produce a file called <prefix>_all_introns.bed.gz which contains all introns and corresponding gene names in a bed format. You could simply use groupBy from bedtools to get whatever metric you need.
Dear Pierre Lindenbaum,
Since I am not a bioinformatician, if the following questions are really annoyed you, I think I should apologize first.
If I understand your code right, you operate these code under Linux.
The first line of code is to download the GTF file.
And the second line is how you apply the Java to run the Jvarkit and the rest of is that how you calculate the intron length.
But between these two lines, how to import GTF into the bioalcidaejdk?
And if it is possible...is there anything like manuscript or protocol could let me do step by step?
Sorry about asking for such a stupid question, but I could not find anyone nearby to ask.
Thank for you two give me such good solution. Give me some time, I will try it both. I am not familiar with JAVA script and perl but I will try my best.
Thank you all again, I even did not expect such fast respond.
Sincerely