Question

Bedtools give different output

0

Entering edit mode

22 months ago

anikcropscience ▴ 260

Hello,

I have converted a gene annotation file using bedtools and used another bed file containing regions under selective sweeps. My goal is the find how many genes are inside the interval of the selective sweep regions.

I have a converted gene annotation file called gene_annot.bed and another file containing intervals in each scaffold called sweep_interval.bed. I first sorted both the annotation and the selective sweep interval files in bedtools and then used the following code to find how many genes are overlapping with the intervals.

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa > results.txt

Using this code, I got 915 genes. But if I used the following code:

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -u -wa > results_1.txt

Then I got 905 genes.

Can you please tell me if the code I am using especially -a and -b is right and if should I use the -u option or not?

Thank you.

bedtools annotation genome SNP bash • 1.2k views

ADD COMMENT • link 22 months ago by anikcropscience ▴ 260

score 1 · Accepted Answer · 2023-01-25

1

Entering edit mode

22 months ago

barslmn ★ 2.3k

Using this code, I got 915 genes. But if I used the following code:

Then I got 905 genes.

How are you counting the genes?

My goal is the find how many genes are inside the interval of the selective sweep regions.

If your gene_annot.bed contains the unique gene intervals, and you just want to count the genes use the -u which returns unique overlapping regions on the -a file.

https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html#u-unique-reporting-the-mere-presence-of-any-overlapping-features

ADD COMMENT • link 22 months ago by barslmn ★ 2.3k

0

Entering edit mode

I am counting with the following command:

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa | sort | uniq -c > results_new.txt

So, each line in the output text file is a different gene. Am I doing something wrong?

ADD REPLY • link 22 months ago by anikcropscience ▴ 260

0

Entering edit mode

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa | sort | uniq -c > results_new.txt

This command would result something similar to running bedtools with u. But these commands just write output the file. How are you getting the numbers?

Also, If you like to see what is different between these files diff command would help.

ADD REPLY • link 22 months ago by barslmn ★ 2.3k

0

Entering edit mode

So, I just count the lines of the output file. Because I see every line is a unique gene ID. Isn't it the approach? or there is any other method?

ADD REPLY • link 22 months ago by anikcropscience ▴ 260

1

Entering edit mode

There will be lots of uncertainties. No one can say you're doing it wrong or not without seeing the data. If you're unsure about it or have questions, create small positive and negative control data for yourself and see the output. Compare the differences and try to produce examples to reproduce these differences. Check if you're not seeing what you're not supposed to see, and you're seeing what you supposed to see.