Hello everyone,
I have some ATACseq bam files and a reference peak set(genomic regions) in text format. I am trying to get count matrix from the bam files( i.e or each bam file I want to calculate f how many reads fall in each peak) . so I am trying to use featureCounts
for this purpose. which needs annotations in SAF or GTF format. 1) in my case is annotation the reference peakset ? 2) if so how to convert my peak set to SAF
or GTF
format.
Here are a few lines of the peak set
seqnames start end name score annotation percentGC
chr1 906012 906513 ACC_10 7.171192997 Intron 0.612774451
chr2 112541661 112542162 ACC_10008 22.03057903 Promoter 0.55489022
chr1 21673421 21673922 ACC_1001 6.459954383 Distal 0.508982036
chr2 112584205 112584706 ACC_10013 43.20855549 Promoter 0.586826347
chr2 112596243 112596744 ACC_10016 5.428209077 Intron 0.491017964
chr1 21725692 21726193 ACC_1002 5.201272875 Intron 0.405189621
thank you very much.
one following question: my bam files are not in the same directory, so I created a text file containing all bam file locations like :
acc_bamFiles.txt
here is a few lines:however when I try :
I get this error message:
so should I just run featureCounts for each bam file separately and then append count matrixes together or is there an easier way ?
I do not think that this can be a text file. I usually make symbolic links to the directory where the SAF file is. Say you are in the SAF directory use
ln -s /path/to/bam .
for all BAMs and then use*.bam
.thank you very much!
Hi ATpoint ,
in my data
macs2
return multiple peaks per region:which return (I believe) that sum of read count the that region:
Do you know if this is a 'correct' way to pool all the peaks of same region and perform differential expression? or are there better options?