bedtools_intersect files sorting problem
0
0
Entering edit mode
7.6 years ago

Dear all, I am running the command: bedtools intersect -sorted -g [GENOME_FILE] -abam [BAM_FILE] -b [BED_FILE] > [OUTPUT_FILE] - and am getting an error about wrong sorting. Apparently my input BAM files have strange sorting, that goes: chr1 chr2 ... Chr9 X Y Chr11 chr12 ...

While BED and GENOME files are: chr1 chr2 ... chr22 X Y

As I understand it is easier to "re-sort" BED file than BAM. Still, I have troubles with doing this :) Could any one advice on this subject? Thank you!

bedtools_intersect • 3.0k views
ADD COMMENT
0
Entering edit mode

How did you sort the bam file?

ADD REPLY
0
Entering edit mode

I did not. I have sorted bed and genome files according to bam (as I thought). only later I found out that bam file has this strange sorting with XY chromosomes, and, honestly, i dont know how to sort bam file :)

ADD REPLY
0
Entering edit mode
samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-T tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]

This is what you're looking for if you don't know how to sort a bam file: http://www.htslib.org/doc/samtools.html

ADD REPLY
0
Entering edit mode

I still dont understand how i can reshape the order of the chromosomes, so that X and Y will be not between chr9 and chr10, but at the beginning/or at the end...

ADD REPLY
0
Entering edit mode

Depending in your error, you may need to sort the bam file. You will need to add more info to the question in case you want some help

ADD REPLY
0
Entering edit mode

the error is that genome file and bed file have sorting "1 2 3 4 5 6 7 8 9 10 11... x y" and bam file has " 1 2 3 4 5 6 7 8 9 x y 11 12..."

ADD REPLY
0
Entering edit mode

You need to sort your bed_file and your genome_file appropriately. Bedtools has a 'sort' function that's rather slow, but you can do this to fix the problem:

LC_COLLATE=C sort -k 1,1 -k2,2n a.bed
ADD REPLY
0
Entering edit mode

I actually sorted the bed file in this manner, but it puts the XY at the end, while my bam file has XY chromosomes in between single-number chromosomes and double-number chromosomes (i.e. between chr9 and chr10)

ADD REPLY
0
Entering edit mode

So you did LC_COLLATE?

ADD REPLY
0
Entering edit mode

yes, and XY chromosomes a placed at the end :)

ADD REPLY

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6