Entering edit mode
7.8 years ago
Fatima
▴
1000
I need to find the number of genes in a transcription unit based on the mapping results. I don't know how to get this information. I have the start and end site of each transcription unit and also strand. Can I use SAM or BAM file to extract the number of genes in each TU? How?
Probably convenient to add in your post which organism you are working on with which technology.
What is a trancript unit for you ? For me it is the very definition of gene...
Hi, A transcription unit is the sequence between sites of initiation and termination by RNA polymerase; may include more than one gene
That makes sense, thanks for clarifying.
Do you have annotations in form of a GTF/GFF file?
Yes, I do. Should I use them? I thought I need to use mapping results!
If you have the start and stop of the transcription unit then you could look at using something like bedtools intersect (with your GTF file) to identify genes that fall in those boundaries. If I understand this correctly you would not need the alignment file (unless you want to verify/count only genes that you are actually seeing in your data).