Hello I'm trying to intersect 1 RNA-seq with 1 ChIP-seq file.
My ChIP-seq file looks like this:
chrX 118904350 118904500
chrX 169994450 169994600
chrX 170004550 170004700
chrX 170005250 170005400
chr13 44869500 44869750
chr13 112578250 112578400
chr12 41777950 41778100
whereas my RNA-seq file looks like this:
Chrom Start
chr1 3214482;3421702;3670552
chr1 3647309;3658847
chr1 3680155
chr1 4290846;4343507;4351910;4352202;4360200;4409170
chr1 4490928;4493100;4493772;4495136;4496291
chr1 4773198;4777525;4782568;4783951;4785573
End Strand Length Symbol
3216968;3421901;3671498 - 3634 Xkr4
3650509;3658904 - 3259 Gm19938
3681788 + 1634 Gm10568
4293012;4350091;4352081;4352837;4360314;4409241 - 9747 Rp1
4492668;4493466;4493863;4495942;4496413 - 3130 Sox17
4776801;4777648;4782733;4784105;4785726 - 4203 Mrpl15
Bedtools intersect doesn't seem to be working and I get an error message regarding the RNA-seq file. I think it happens because more than 1 genomic region for Start and End is assigned for each gene. Any idea how to collapse the ";" and end up with something equivalent to my ChIP-seq file?
Thanks
Your RNAseq file is a hacked version of a bed12 file, so just download the real bed12 file and use that.
What do you mean by hacked? Plus I didn't download the file from anywhere. I generated everything - This is my DE file
"Hacked" means "modified" in this context. If you generated everything then just generate it again in a more useful format. You can certainly convert it to something more useful with awk (as nicely demonstrated by Joseph Pearson ), but in the long term it's simpler to stick to more standardized formats.
Link doesn't work
Not sure how to reformat the file. I add the whole annotation file right in R right after I call
featureCounts
and this is the final outputFixed the link, that was just a biostars bug. It would have been helpful had you said you were using featureCounts. Either write a script to parse this into a modified BED format or just intersect the ChIP results with a BED file containing gene coordinates, cut out the gene name, and grep the appropriate lines from the output you posted.
This is what I use for the Chrom column
but I have no clue how to reformat the Start/End