Can or should we perform feature counting with `bedtools intersect -c`?
1
2
Entering edit mode
7.2 years ago
gundalav ▴ 380

I have the following BAM file and a annotation file. The BAM is a paired end alignment. What I want to do is to calculate reads (as fragment) count in I can do it with the following three methods.

Below is the code

#!/bin/bash
INPUT_BAM=my_paired_end_alignment.bam
ANNNOT_BED=peak_annot.bed
ANNNOT_GTF=peak_annot.gtf

# Temporary file
ALIGNMENT_BED=my_paired_end_alignment.bed

#-----------
# Method 1
#-----------
FINAL_COUNT_METHOD1=output_method1.txt
# convert BAM to BED
bedtools bamtobed -bedpe -i $INPUT_BAM | cut -f1,2,6,7,8,9 > $ALIGNMENT_BED
# Count fragments from BAM on ANNOT_BED
bedtools intersect -a $ANNOT_BED -b $ALIGNMENT_BED -c > $FINAL_COUNT_METHOD1

#-----------
# Method 2
#-----------
FINAL_COUNT_METHOD2=output_method2.txt
bedtools multicov -bams $INPUT_BAM -bed $ANNOT_BED > $FINAL_COUNT_METHOD2

#-----------
# Method 3
#-----------
FINAL_COUNT_METHOD3=output_method3.txt
featureCounts  -a $ANNOT_GTF -o $FINAL_COUNT_METHOD3 $INPUT_BAM

The result given by Method1 is so different from Method2 (totally uncorrelated), whereas Method2 is highly correlated with Method3.

My question Is Method1 the right way to count feature? If not why? Should we use Method2 or Method3 instead?

Speed is less of a concern at the moment, but rather the correctness of Method1. I need to be convinced if Method1 is correct or not.

bedtools rna-seq featureCounts • 3.6k views
ADD COMMENT
3
Entering edit mode
7.2 years ago
ATpoint 85k

Method1: Not advisable, as it simply takes too much time to make a BED and consumes unnecessary space as the BED is not binary. Still, if you want to do that, pipe the bedpe output directly into:

cut -f1,2,6,7,8,9

to get a proper representation of your fragment that adequately covers the interval between the reads (the ISIZE basically). The cut will make a normal bed with the start of the fwd and the end of the rev read. Still as I said, not really efficient for your purpose.

Method 2: I think multicov still cannot handle paired-end reads and counts both mates independently.

Method 3: Method of choice, fast and efficient with many parameters to finetune for your purpose, but use the -p option for paired-end!

ADD COMMENT
0
Entering edit mode

@ATPoint Thanks for your reply. Speed is less of a concern at the moment, but rather the correctness of Method1. I tried using cut -f1,2,6,7,8,9. But the Method1 is totally uncorrelated (different) from Method2 and Method3. I need to be convinced why Method1 is incorrect. Your enlightenment will be greatly appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6