Hello,
I have bam files of 8 samples (4 normal and 4 diseased), produced by alignment with novoalign (small-rna sequencing data). I have excluded the mirna from the bam files by using the following command:
bedtools intersect -v "sample.bam" "hg19_mirna.gff3" > output.bam
In this manner I have excluded the miRNA from all the samples. Now I want to find the utr (3' and 5') sequences present in the resultant bam files, and the read counts of each of the utr sequences.
Could anyone suggest a way to do this?
You can try featureCounts with meta-feature "UTR" level.
Thank you. Actually I want to get a gtf file of the utr sequences, and then count the reads using htseq-count. But I am unable to find the gtf files of 3' and 5' utrs. Do you know where I could get it?
at ensembl you have gtf files for many organisms. They contain "UTR" metadata (I know for human and mouse).
Hi, I am currently using hg19 database (Grch37 version). I am unable to find the gtf / gff3 files of the utrs of this version. Could you please link me to them? Thanks a lot
I recommend you upgrade to hg38! Or search in the archives somewhere.