Dear BioStars,
I am getting many helps from BioStars, and it's my first time to post question here, a little nervous.
With SAM file, I want to do grouping small RNA-seq reads mapping within a certain intervals (e.g. 100 nt) of each other into clusters, and rank the clusters by the numbers of reads.
I am just wondering there are any tools to implement this job? Thanks for your help!
I am guessing:
You need to search for something like "generate equally sized genomic bins" or "read binning" "generate equally sized genomic intervals". There are several ways to do this, either in Bedtools or R.
See:
If I read the question correctly I think the OP is looking for something like piRNA clusters - regions enriched for certain types of smRNA. Slightlym different approach because the regions/clusters would have variable lengths, and most of the genome would be free of these. More like a aggregation operation I think.
I think it is very hard to tell what is really wanted here.
You are right. I am looking for small RNA clusters enriched in certain genomic regions, just like piRNA clusters. As Michael said, I need to do binning small RNA-seq reads within, for example, 100 nt of each other. But, I want to keep alignment information of reads rather than converting BED format, because I need to map reads in a cluster again to see where the reads come from (like intergenic or coding region?). Thanks
samtools (http://www.htslib.org/doc/samtools.html), you need to first sort SAM file then use samtools view to cut out certain regions.
You can try using MACS tool and then process the required result from the output file. Below is the command
Hope this solves your problem.