Hi, I have a bed file for the baits and I want to get a list of targeted genes. So I downloaded a bed file from UCSC with the settings below and used bedtools to intersect. But I end up having >8000 genes (the panel has 353 genes). and GNB1 for example is not in the panel. Could you tell me what did I do wrong please? Thank you!
Clade: Mammals
assembly: hg38
group: Gene and Gene Predictions
track: NCBI RefSeq
table: RefSeq All (ncbiRefSeq)
output format: select from primary table (chromosomes, cdsStart, cdsEnd, name2)
Then I used:
bedtools intersect -a bait.bed -b UCSC.bed -wa -wb > intersect.bed
head intersect.bed
chr1 1819822 1820002 chr1:1819775-1819875 chr1 1787330 1825453 GNB1
chr1 1819822 1820002 chr1:1819775-1819875 chr1 1787330 1825453 GNB1
chr1 1819822 1820002 chr1:1819775-1819875 chr1 1787330 1825453 GNB1
chr1 1819822 1820002 chr1:1819775-1819875 chr1 1787330 1825453 GNB1
chr1 1819822 1820002 chr1:1819775-1819875 chr1 1787330 1825453 GNB1
chr1 1819822 1820002 chr1:1819775-1819875 chr1 1787330 1825453 GNB1
chr1 3683000 3683240 chr1:3683057-3683185 chr1 3682365 3732762 TP73
chr1 3683000 3683240 chr1:3683057-3683185 chr1 3682365 3733079 TP73
chr1 3683000 3683240 chr1:3683057-3683185 chr1 3682365 3733079 TP73
chr1 3683000 3683240 chr1:3683057-3683185 chr1 3682365 3731555 TP73
chr1 3683000 3683240 chr1:3683057-3683185 chr1 3682365 3732762 TP73
check you have no duplicate in your input bed.
Hi Pierre,
thank you for the reply. Please let me know if I am wrong but it doesn't look like it has duplicates. I wanted to input the bed file to UCSC but there is a limit of 1000 entries and there are ~9000 lines in the bait.bed file.
head -20 bait.bed
head UCSC.bed
Sorry I missed that. But even if there are duplicates in the UCSC bed file. Why genes that are not targeted in the panel appear after intersecting? I previously counted the number of genes based on the unique gene names not line numbers. So I imagine removing the duplicates will not change the number of unique genes in the bed file after intersect? Thanks again