How to create SAF from text file for FeatureCounts
1
0
Entering edit mode
3.6 years ago
pt.taklifi ▴ 60

Hello everyone,

I have some ATACseq bam files and a reference peak set(genomic regions) in text format. I am trying to get count matrix from the bam files( i.e or each bam file I want to calculate f how many reads fall in each peak) . so I am trying to use featureCounts for this purpose. which needs annotations in SAF or GTF format. 1) in my case is annotation the reference peakset ? 2) if so how to convert my peak set to SAF or GTF format.

Here are a few lines of the peak set

seqnames    start   end name    score   annotation  percentGC
chr1    906012  906513  ACC_10  7.171192997 Intron  0.612774451
chr2    112541661   112542162   ACC_10008   22.03057903 Promoter    0.55489022
chr1    21673421    21673922    ACC_1001    6.459954383 Distal  0.508982036
chr2    112584205   112584706   ACC_10013   43.20855549 Promoter    0.586826347
chr2    112596243   112596744   ACC_10016   5.428209077 Intron  0.491017964
chr1    21725692    21726193    ACC_1002    5.201272875 Intron  0.405189621
featureCounts ATAC-seq • 6.0k views
ADD COMMENT
4
Entering edit mode
3.6 years ago
ATpoint 85k

Converting from BED to SAF/GFF

If your file has a header you will need to skip the first line, in awk that would be via NR>1. You can convert this file you have there to SAF and then basically do something like:

featureCounts -a your.saf -F SAF -o counts.txt *.bam
ADD COMMENT
0
Entering edit mode

thank you very much.

ADD REPLY
0
Entering edit mode

one following question: my bam files are not in the same directory, so I created a text file containing all bam file locations like :acc_bamFiles.txt here is a few lines:

SRR10984460/bam/SRR10984460.dedup.bam
SRR10984461/bam/SRR10984461.dedup.bam
SRR10984462/bam/SRR10984462.dedup.bam

however when I try :

featureCounts -a PanCancer_PeakSet.saf -F SAF -o counts.txt acc_bamFiles.txt

I get this error message:

ERROR: invalid parameter: 'acc_bamFiles.txt'

so should I just run featureCounts for each bam file separately and then append count matrixes together or is there an easier way ?

ADD REPLY
3
Entering edit mode

I do not think that this can be a text file. I usually make symbolic links to the directory where the SAF file is. Say you are in the SAF directory use ln -s /path/to/bam . for all BAMs and then use *.bam.

ADD REPLY
0
Entering edit mode

thank you very much!

ADD REPLY
0
Entering edit mode

Hi ATpoint ,

in my data macs2 return multiple peaks per region:

chr     start   end     length  abs_summit      pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
1       826531  828140  1610    826808  1730    1328.65 14.7141 1326.59 peak_all_peak_1a
1       826531  828140  1610    827539  4563    5338.88 38.7957 5336.28 peak_all_peak_1b
1       826531  828140  1610    827986  698     292.385 5.94175 290.704 peak_all_peak_1c
1       831302  831572  271     831439  329     57.8574 2.80512 56.4633 peak_all_peak_2
1       832037  832505  469     832340  290     41.2257 2.47361 39.8811 peak_all_peak_3
1       844126  846414  2289    844345  297     44.053  2.53311 42.6991 peak_all_peak_4a
1       844126  846414  2289    844868  886     448.757 7.53982 446.986 peak_all_peak_4b

which return (I believe) that sum of read count the that region:

1.826531.828140 1;1;1   826531;826531;826531    828140;828140;828140    +;+;+   1610    323     485     506     419     193     275     264     441     479     390     488     548     266     383471     527     417     283     445     470     606     612     575     471
1.831302.831572 1       831302  831572  +       271     5       6       13      11      7       8       5       9       15      14      11      11      7       10      10      12      6       7 813      13      9       16      15
1.832037.832505 1       832037  832505  +       469     10      9       16      22      6       5       7       23      24      33      31      28      8       13      15      15      14      7 15       9       19      22      27      19
1.844126.846414 1;1;1;1;1;1     844126;844126;844126;844126;844126;844126       846414;846414;846414;846414;846414;846414       +;+;+;+;+;+     2289    170     271     242     253     106     195144     82      68      94      84      87      148     197     226     225     185     134     174     260     254     291     122     113

Do you know if this is a 'correct' way to pool all the peaks of same region and perform differential expression? or are there better options?

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6