Question

small RNA-seq reads grouping by adjacency

0

Entering edit mode

8.3 years ago

apbiomol • 0

Dear BioStars,

I am getting many helps from BioStars, and it's my first time to post question here, a little nervous.

With SAM file, I want to do grouping small RNA-seq reads mapping within a certain intervals (e.g. 100 nt) of each other into clusters, and rank the clusters by the numbers of reads.

I am just wondering there are any tools to implement this job? Thanks for your help!

RNA-Seq alignment • 2.2k views

ADD COMMENT • link updated 8.3 years ago by A. Domingues ★ 2.7k • written 8.3 years ago by apbiomol • 0

1

Entering edit mode

I am guessing:

you want to bin your reads into bins of constant width over the whole genome?
summarize each bin by number of reads

You need to search for something like "generate equally sized genomic bins" or "read binning" "generate equally sized genomic intervals". There are several ways to do this, either in Bedtools or R.

See:

ADD REPLY • link 8.3 years ago by Michael 56k

1

Entering edit mode

If I read the question correctly I think the OP is looking for something like piRNA clusters - regions enriched for certain types of smRNA. Slightlym different approach because the regions/clusters would have variable lengths, and most of the genome would be free of these. More like a aggregation operation I think.

ADD REPLY • link 8.3 years ago by A. Domingues ★ 2.7k

1

Entering edit mode

I think it is very hard to tell what is really wanted here.

ADD REPLY • link 8.3 years ago by Michael 56k

0

Entering edit mode

You are right. I am looking for small RNA clusters enriched in certain genomic regions, just like piRNA clusters. As Michael said, I need to do binning small RNA-seq reads within, for example, 100 nt of each other. But, I want to keep alignment information of reads rather than converting BED format, because I need to map reads in a cluster again to see where the reads come from (like intergenic or coding region?). Thanks

ADD REPLY • link 8.3 years ago by apbiomol • 0

0

Entering edit mode

samtools (http://www.htslib.org/doc/samtools.html), you need to first sort SAM file then use samtools view to cut out certain regions.

ADD REPLY • link 8.3 years ago by syrttgump ▴ 50

0

Entering edit mode

You can try using MACS tool and then process the required result from the output file. Below is the command

/tool/MACS/MACS-1.4.2/bin/sam2bed input.sam output.bed

Hope this solves your problem.

ADD REPLY • link 8.3 years ago by mks002 ▴ 220

score 0 · Answer 1 · 2017-05-04

I suggests a combination of bedtools merge or cluster, depending on what is the final goal. For instance, using merge:

## code untested
bamToBed -i my.bam \ # converts bam to bed. Ensures that read ID is kept which will be useful for counting
   | mergeBed -i stdin -c 4 -o count \ # merges reads within 100 base pairs and counts the number of reads in each merged interval using the read ID in col4
   | head # peek results before saving

Keep in mind that will not account for strandness of reads. Use the options -s or -S for that. Well read the tool documentation for fine tuning.

Using cluster should also work, but it would require a little more work and a merge wnayway. The only advantage I see over mergeis that it would allow you to keep the read IDs for each cluster.