Hi all,
I am trying to analyze the deposition of a histone modification in ChIP-seq. So I got the union peak bed file, it looks like this:
chr1 5162780 5163619
chr1 6205640 6214319
chr1 6215300 6215579
chr1 6217540 6218379
to count the fragments on the peaks, I want to use featureCounts. FeatureCounts only accept the annotation file as gtf. I added the 4th column to bed file so that it can pass bedToGenePred
$ awk '{$4="." print $0}' union_peaks.bed > union_peaks_1.bed
I get
chr1 5162780 5163619 .
chr1 6205640 6214319 .
chr1 6215300 6215579 .
chr1 6217540 6218379 .
Then I convert the bed file to gtf by
$ bedToGenePred union_peaks_1.bed stdout | genePredToGtf file stdin union_peaks.gtf
But when I check the column number of this gtf file
$ awk '{print NF}' union_peaks.gtf
it shows 16.
Then I found out the features are created as separate columns instead of one column 9, for instance, column 9 is gene_id, column 10 is "._5", column 11 is transcript_id, and so on and so forth.
chr1 stdin exon 4823701 4824259 . + . gene_id "._5"; transcript_id "._5"; exon_number "1"; exon_id "._5.1";
chr1 stdin CDS 4823701 4824259 . + 0 gene_id "._5"; transcript_id "._5"; exon_number "1"; exon_id "._5.1";
chr1 stdin start_codon 4823701 4823703 . + 0 gene_id "._5"; transcript_id "._5"; exon_number "1"; exon_id "._5.1";
chr1 stdin transcript 4829161 4831959 . + . gene_id "._6"; transcript_id "._6";
chr1 stdin exon 4829161 4831959 . + . gene_id "._6"; transcript_id "._6"; exon_number "1"; exon_id "._6.1";
chr1 stdin CDS 4829161 4831956 . + 0 gene_id "._6"; transcript_id "._6"; exon_number "1"; exon_id "._6.1";
chr1 stdin start_codon 4829161 4829163 . + 0 gene_id "._6"; transcript_id "._6"; exon_number "1"; exon_id "._6.1";
chr1 stdin stop_codon 4831957 4831959 . + 0 gene_id "._6"; transcript_id "._6"; exon_number "1"; exon_id "._6.1";
chr1 stdin transcript 4857301 4891039 . + . gene_id "._7"; transcript_id "._7";
Can you help me find out what is wrong here, and how to solve the problem?
Thank you for your answer, h.mon. It saved a lot of time for me! It is smart to use SAF instead of GTF, and it works now!
I just want to add for other people that featureCounts has very stringent requirement to SAF feature (first column), first column with pure number, "peak_NR", "peak.NR", were all not accepted, so
would do the job in the end.
Many thanks again!