Entering edit mode
6.5 years ago
plebaninora
•
0
Hello,
I'm new in bioinformatics, and we should make a project for the exam, it is about computing statistics from gtf files like:
- Total number of genes.
- Avg number of alternative transcripts per gene
- Avg number of introns/exons per gene
- Avg number of alternative transcripts per gene
- Avg length of CDS, 5' and 3' UTR
Could you send me some link, actually anything to learn abt these stuff, i would really appreciate it.
Thank you in advance,
Best regards Nora
What have you tried?
I can hardly imagine you've been given this assignment without any background, no?
Why not "Calculate genomic statistics from gtf" as title? In fact, if you search for
genomic statistics from gtf
orgenomic statistics from gtf site:biostars.org
you will find plenty of answers.oooh, i didnt expect answers this fast. thx a lot. yes we did a course, abt fasta, x2, hash variables scalars, rand, ect but nth abt gtf, so i searched always the same things are written, no clue, what i should do and where i should start. so i downloaded the annotation file that i should do the project from gencode. and its not sorted, i tried to sort it, but everything i find is in gff or gff3, idk that is accaptable if i do the project in gff3 and then convert it to gtf. so i need some materials to study, i dont want to ask proff, coz i dont want to affect in my grade. yes i searched abt statistitics, til now nth. if u have book to suggest, pdf, link, anything that i can start from. that would be awsome. thx a lot again for ur response.
Please use the
ADD COMMENT
button when addressing comments / answers. And please, you are dealing with many non-native English speakers (like myself) who have a hard time understanding abbreviations and slang, so write in a more formal manner.Let's start with your question, which is about how to work with GTF data.
Instead of trying to work with a full dataset from Gencode, I'd suggest stepping back and starting by instead first reading a little about the GTF format, such as in this link:
http://mblab.wustl.edu/GTF22.html
That link includes some example snippets, very short snippets, with explanations about attributes of GTF files, including features.
Features are what make each line of a GTF file important.
Once you read about features, it can then become a little easier to think about how to do counting exercises, such as counting genes, average transcripts per gene, etc.
When you're at that point, read the links found by searching the keywords in h.mon's comment. There you will find links to tools that help with reading in GTF-formatted files and doing those counting exercises on the data within.
You might also check out the gff3 format, which is an important extension to the gff2/gtf format. For both formats there are parsers in many languages, BioPerl for example, of course Python has an extensions as well, for example gffutils
My apology for disrupting the flow, this was supposed to become a comment based on Alex' earlier comment rather than a separate answer.
oooh sorry, i didnt realize that. (im non native too, my bad)
thanks for the link, i have already studied it. i read perlmonk, perldoc, every tutorial links that i found but nothing so far.
i need something to start from, to work, to get familiar with then can do the project.
when you all started to work with gtf, where did you start from, study from?
my apology again, its my bad habit but fast one. :p ;) thank you so much for your response. you are so available. :)
edit: i couldnt add comment, it gave me some errors, so i submit the answer and it worked.
You should be able to edit your comment and make changes there. Please see posts under http://biostars.org/t/how-to for step by step guidelines.
Also, h.mon requested that your comments be formal - that means professional as well. I'd recommend avoiding emojis such as
:p
and;)
, as they're not strictly professional and it's better off not giving a playful vibe on online scientific forums.