Bedtools give different output
1
0
Entering edit mode
22 months ago

Hello,

I have converted a gene annotation file using bedtools and used another bed file containing regions under selective sweeps. My goal is the find how many genes are inside the interval of the selective sweep regions.

I have a converted gene annotation file called gene_annot.bed and another file containing intervals in each scaffold called sweep_interval.bed. I first sorted both the annotation and the selective sweep interval files in bedtools and then used the following code to find how many genes are overlapping with the intervals.

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa > results.txt

Using this code, I got 915 genes. But if I used the following code:

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -u -wa > results_1.txt

Then I got 905 genes.

Can you please tell me if the code I am using especially -a and -b is right and if should I use the -u option or not?

Thank you.

bedtools annotation genome SNP bash • 1.2k views
ADD COMMENT
1
Entering edit mode
22 months ago
barslmn ★ 2.3k

Using this code, I got 915 genes. But if I used the following code:

Then I got 905 genes.

How are you counting the genes?

My goal is the find how many genes are inside the interval of the selective sweep regions.

If your gene_annot.bed contains the unique gene intervals, and you just want to count the genes use the -u which returns unique overlapping regions on the -a file.

https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html#u-unique-reporting-the-mere-presence-of-any-overlapping-features

ADD COMMENT
0
Entering edit mode

I am counting with the following command:

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa | sort | uniq -c > results_new.txt

So, each line in the output text file is a different gene. Am I doing something wrong?

ADD REPLY
0
Entering edit mode
bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa | sort | uniq -c > results_new.txt

This command would result something similar to running bedtools with u. But these commands just write output the file. How are you getting the numbers?

Also, If you like to see what is different between these files diff command would help.

ADD REPLY
0
Entering edit mode

So, I just count the lines of the output file. Because I see every line is a unique gene ID. Isn't it the approach? or there is any other method?

ADD REPLY
1
Entering edit mode

There will be lots of uncertainties. No one can say you're doing it wrong or not without seeing the data. If you're unsure about it or have questions, create small positive and negative control data for yourself and see the output. Compare the differences and try to produce examples to reproduce these differences. Check if you're not seeing what you're not supposed to see, and you're seeing what you supposed to see.

ADD REPLY
0
Entering edit mode

Thank you. I will do that.

ADD REPLY

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6