Hello,
I have converted a gene annotation file using bedtools
and used another bed file containing regions under selective sweeps. My goal is the find how many genes are inside the interval of the selective sweep regions.
I have a converted gene annotation file called gene_annot.bed
and another file containing intervals in each scaffold called sweep_interval.bed
. I first sorted both the annotation and the selective sweep interval files in bedtools
and then used the following code to find how many genes are overlapping with the intervals.
bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa > results.txt
Using this code, I got 915 genes. But if I used the following code:
bedtools intersect -a gene_annot.bed -b sweep_interval.bed -u -wa > results_1.txt
Then I got 905 genes.
Can you please tell me if the code I am using especially -a
and -b
is right and if should I use the -u
option or not?
Thank you.
I am counting with the following command:
So, each line in the output text file is a different gene. Am I doing something wrong?
This command would result something similar to running bedtools with
u
. But these commands just write output the file. How are you getting the numbers?Also, If you like to see what is different between these files
diff
command would help.So, I just count the lines of the output file. Because I see every line is a unique gene ID. Isn't it the approach? or there is any other method?
There will be lots of uncertainties. No one can say you're doing it wrong or not without seeing the data. If you're unsure about it or have questions, create small positive and negative control data for yourself and see the output. Compare the differences and try to produce examples to reproduce these differences. Check if you're not seeing what you're not supposed to see, and you're seeing what you supposed to see.
Thank you. I will do that.