quantify counts from sorted bam files using a bed file containing regions of interest
2
0
Entering edit mode
2.7 years ago
bioyas ▴ 20

Hi everyone,

I have generated sorted bam files for multiple samples. I also have generated a particular BED file (tab limited file with Chr, start and end regions of interest). I would like to quantify reads mapped to these regions and generate a count matrix.

I used the following command to generate bam files for the desired regions stored in my.bed file (using original sample.bam)

samtools view -h -L my.bed sample.bam 

The above command generates the new bam files. I sorted the bam file using sam sort. Now I would like to quantify the amount of reads mapped to each desired region as counts from the new bam file. I tried to use featureCount from subRead as below:

featureCounts -A my.Bed -o SampleCounts.txt sample.Sorted.bam

This command does not generate the counts. Am I supposed to only pass gtf file as annotation to feature count? Or my steps are wrong? Any idea is really appreciated.

Thanks

counts featureCount bam Bed • 2.7k views
ADD COMMENT
1
Entering edit mode
2.7 years ago
Jeffin Rockey ★ 1.3k

For overlap and counting purposes based on a 3 column bed,

Bedtools intersect should help. Link

Please do see the different options like -c, -u, -wa, -wb which are very helpful under various contexts.

Also related are

https://bedtools.readthedocs.io/en/latest/content/tools/multicov.html https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html

But if the purpose is differential expression input count or so, I believe it is recommended use featureCounts or HTSeq etc.

And on featurecounts query, I am not sure if gtf is 'mandatory' but mostly gtf is seen used with feature counts.To slightly detail, in use cases like RNA-Seq the exon-gene mapping info from gtf are quite very relevant with respect to the counting.

ADD COMMENT
0
Entering edit mode
2.7 years ago
Malcolm.Cook ★ 1.5k

featureCounts does not work with .bed files.

If you are intent upon using some of the capabilities of featureCounts, you would want to first create a SAF or GFF formatted version of your .bed data.

This is pretty much a rearrangement of columns, with the possible addition of a unique first column to be the identifier of the region.

If your bed file file also has a name column (sounds like yours does not, but I thought I'd check), you might want to use it as the identifier. The unix cut command can easily reorder columns.

If your bed file does NOT already have a unique identifier, you need an approach to creating one. Converting from BED to SAF/GFF gives some approaches for doing this. If I were to use the awk based approach, the only thing I would change about it would be to use a colon to separate the chromosome from the start position instead of a period, yielding identifiers that you can use to navigate your favorite genome browser (e.g. IGV, UCSC).

ADD COMMENT

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6