If I understand your awk command correctly, $2 represents 5' end of R1 (/left-most) read and $6 represents 5' end of the R2 (/right-most) read. If that is correct, then my question is, why do we add 4 to both ends? Shouldn't it be $2+4 and $6-5?
Hi Ravi, I have the same confusion as yours. Have you understood it? On the first ATAC paper published in 2013, they said "For peak-calling and footprinting, we adjusted the read start sites to represent the center of the transposon binding event." And the ENCODE atac papline showed they adjust one end of the fragment.So I'm mostly in favour of this idea, what do you think?
Hi, I used awk in linux to trim the end of bed file data.
$ awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 4, $3 + 4, $4, $5, $6; else print $1, $2 - 5, $3 - 5, $4, $5, $6}' input.bed >output.bed
Thank you for your reply. Is there a way I could obtain shifted reads in BAM format?
You could convert the BED file back to a BAM file using bedtools bedtobam command
Hi, besides pos/start/end, if I also want to extract the sequence inforamtion about the fragment, what should I do?
You can use bedtool's getfasta command:
Hello James,
If I understand your
awk
command correctly, $2 represents 5' end of R1 (/left-most) read and $6 represents 5' end of the R2 (/right-most) read. If that is correct, then my question is, why do we add 4 to both ends? Shouldn't it be$2+4
and$6-5
?Thanks.
Hi Ravi, I have the same confusion as yours. Have you understood it? On the first ATAC paper published in 2013, they said "For peak-calling and footprinting, we adjusted the read start sites to represent the center of the transposon binding event." And the ENCODE atac papline showed they adjust one end of the fragment.So I'm mostly in favour of this idea, what do you think?