Question

Duplicate chromosome location columns in a bed file makes problem in importing them into R

0

Entering edit mode

2.4 years ago

minoo ▴ 10

I have converted a paired of fastq files to a bed file using the code below:

bowtie2 --end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 -p ${cores} -x ${ref} -1 ${fastQR1} -2 ${fastQR2} -S ${samFile} &> ${txtFile}

samtools view -bS -F 0x04 $proj/a.sam >$proj/a.bam

bedtools bamtobed -i $proj/a.bam -bedpe >$proj/a.bed

And now the head of my bed file look like below:

chr1    242251375   242251525   chr1    242251390   242251540   NS500442:247:HKWKNAFX2:1:11101:3060:1048    17  +   -
chr9    41169424    41169574    chr9    41169494    41169644    NS500442:247:HKWKNAFX2:1:11101:14485:1058   1   +   -

It has 10 columns, but I ahve no idea why there are duplicate chromosome location here and how can I iport this to R as GRange object. Any idea?

granges bedtools r samtools bowtie2 • 555 views

ADD COMMENT • link updated 2.4 years ago by cmdcolin ★ 4.0k • written 2.4 years ago by minoo ▴ 10

score 2 · Accepted Answer · 2022-07-05

your bed file is actually a BEDPE file because you specified to use the -bedpe flag to bedtools bamtobed. it has two chr, start, and end columns because it stores each "pair" of reads on a single line

here is some more info

https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format

https://thesequencingcenter.com/knowledge-base/what-are-paired-end-reads/