I generated a bed file form some splicing data and attempted to use both bed tools intersect and bedtools annotate with a master list from UCSC. I have not been able to get any annotations. Below are examples of the data.
chr21 40801057 40801137 0 - 40801057 40801137
chr21 34953608 34953806 0 - 34953608 34953806
chr21 45370568 45370755 0 + 45370568 45370755
chr21 37692487 37692601 0 + 37692487 37692601
chr21 38528957 38529208 0 + 38528957 38529208
chr21 35191545 35191627 0 + 35191545 35191627
chr21 30565815 30566581 0 + 30565815 30566581
From UCSC
chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,109,1189, 0,739,1347,
chr1 11873 14409 uc010nxr.1 0 + 11873 11873 0 3 354,52,1189, 0,772,1347,
chr1 11873 14409 uc010nxq.1 0 + 12189 13639 0 3 354,127,1007, 0,721,1529,
chr1 14361 16765 uc009vis.3 0 - 14361 14361 0 4 468,69,147,159, 0,608,1434,2245,
chr1 16857 17751 uc009vjc.1 0 - 16857 16857 0 2 198,519, 0,375,
Can you just select the data and press
Ctrl+k
Why not? maybe the data doesn't intersect! Also, what is the full command that you used?
likely possible that your project data might not correspond to that of the UCSC data but for others to give an idea please try to format the question as venu said and then we can play with the to bed format files. But having said that are both the bed files generated from the same master source? Please mention what is the first bed input and the second bed input after formatting and then if they are properly tab delimited the intersect should run fine and give some output unless there is no intersection.
Both are from UCSC hg19. The first was manually created using output data from MISO, specifically events name column. This was then separated into tab columns that conformed with bed format. I primarily need to add gene symbols to this data because we are only provided with the exons that underwent splicing and not which genes were splice regulated.
First of all your input file examples will not have any intersection as the bed intersect is dependent on first 3 tab columns of both input files and in your example I do not see any match as the data is having
chr 21
in one case and in the other you havechr 1
, try to see in both the files with a grep if you have all chromosomes or not and then do the bedtools intersect using-wa -wb
to get the other desired columns. However you do not have gene symbols in any of the file so you cannot have them withbedtools intersect
, in any case once you have the intersection you can annotate the intersected bed file with ensemble or you can download the hg19 UCSC bed file with gene symbols and awk the file havingchr, start, end , refseq_id, gene symbols
and then intersect with bedtools of the first file and see if there is any intersection or not.What was the output of bedtools intersect? An empty file, or intersection without the annotation?
Run the dos2unix utility on both the files. Make sure there is no header in the file downloaded from UCSC. Then try using bedtools intersect.