The below command produces a file (674 MB).
coverageBed \
-d \
-sorted \
-a /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed \
-b /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam \
> /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_output.txt
Output
chr1 955542 955763 + AGRN:exon.1 1 0
chr1 955542 955763 + AGRN:exon.1 2 0
chr1 955542 955763 + AGRN:exon.1 3 0
chr1 955542 955763 + AGRN:exon.1 4 1
My question is even-though an output file results I get the below error:
cmccabe@DTV-A5211QLM:~/Desktop/NGS$ coverageBed -d -sorted -a /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed -b /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.bam > /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_counts.txt
ERROR: chromomsome sort ordering for file /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.bam is inconsistent with other files. Record was:
chr10 68734 68755 X28LU:04418:11727 6 +
I checked the bam and it is coordinate sorted:
cmccabe@DTV-A5211QLM:~/Desktop/NGS/pool_I_090215$ samtools view -H IonXpress_008_150902_newheader.bam | grep SO
@HD VN:1.4 GO:none SO:coordinate
I just want to make sure that I can use the data output of if I am missing something. I can not find that record in the bam file, is that a sequencing error, a bug, or something unique to Ion Torrent bam files. Thank you :).
Sorted bed file
chr1 955542 955763 + AGRN:exon.1
chr1 957570 957852 + AGRN:exon.2
chr1 976034 976270 + AGRN:exon.2;AGRN:exon.3;AGRN:exon.4
I also verified the sort order in the bam
cmccabe@DTV-A5211QLM:~/Desktop/NGS/pool_I_090215$ samtools view -H IonXpress_008_150902_newheader.bam | grep SQ | cut -f 2 | awk '{ sub(/^SN:/, ""); print;}'
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY
chrM
Is it a lexical vs numerical sort thing: 1 10 .. 2 20 ..?
You've double-checked the order of your bed file too, especially the chromosome ordering, and the naming convention? The fact that the "mis-ordering" occurs relatively early in chromosome 10 makes me think of the formal possibility that one of your files (maybe the bed file?) is sorted chr1 chr10 chr11 .. chr2 chr20 .. instead of chr1 chr2 .. chr9 chr10..
I know you know awk, but one could
awk '{print $1}' aBed.bed | sort -u
to quickly checkI'm guessing your output that you showed is not comprehensive, in which case you might check the tail-end of the output file to see if it also ends on chr1, or if it covered the entire genome.
Looks like I just need to
sort -k1,1 -k2,2n