Question

Shifting reads for ATAC-seq alignments

4

Entering edit mode

8.6 years ago

robm9119 ▴ 180

Hi everyone,

In the original paper for ATAC-seq, the authors shifted the reads +4 bp for the +strand and -5 bp for the -strand: http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2688.html

How can I easily shift paired-end alignments for my ATAC-seq samples as done in the paper?

Thanks!

next-gen alignment • 13k views

ADD COMMENT • link updated 3.4 years ago by Laura Z • 0 • written 8.6 years ago by robm9119 ▴ 180

score 5 · Answer 1 · 2016-04-17

5

Entering edit mode

8.6 years ago

James Ashmore ★ 3.5k

The bedtools command should extract the paired-end alignments as bedpe format, then the awk command should shift the fragments as needed:

bedtools bamtobed -i reads.bam -bedpe | awk -v OFS="\t" '{if($9=="+"){print $1,$2+4,$6+4}else if($9=="-"){print $1,$2-5,$6-5}}' > fragments.bed

Note, the BAM file should be sorted by read name beforehand:

samtools sort -n -T aln.sorted -o aln.sorted.bam aln.bam

ADD COMMENT • link 8.6 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

Thank you for your reply. Is there a way I could obtain shifted reads in BAM format?

ADD REPLY • link 8.6 years ago by robm9119 ▴ 180

1

Entering edit mode

You could convert the BED file back to a BAM file using bedtools bedtobam command

ADD REPLY • link 8.6 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

Hi, besides pos/start/end, if I also want to extract the sequence inforamtion about the fragment, what should I do?

ADD REPLY • link 7.6 years ago by Jeason Rad ▴ 30

0

Entering edit mode

You can use bedtool's getfasta command:

bedtools bamtobed -i input.bam | bedtools getfasta -fi genome.fasta -bed stdin > fragments.fasta

ADD REPLY • link 7.5 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

Hello James,

If I understand your awk command correctly, $2 represents 5' end of R1 (/left-most) read and $6 represents 5' end of the R2 (/right-most) read. If that is correct, then my question is, why do we add 4 to both ends? Shouldn't it be $2+4 and $6-5?

Thanks.

ADD REPLY • link 5.4 years ago by Ravi Patel • 0

0

Entering edit mode

Hi Ravi, I have the same confusion as yours. Have you understood it? On the first ATAC paper published in 2013, they said "For peak-calling and footprinting, we adjusted the read start sites to represent the center of the transposon binding event." And the ENCODE atac papline showed they adjust one end of the fragment.So I'm mostly in favour of this idea, what do you think?

enter code here

cmd = 'zcat -f {} | '
cmd += 'awk \'BEGIN {{OFS = "\\t"}}'
cmd += '{{ if ($6 == "+") {{$2 = $2 + 4}} '
cmd += 'else if ($6 == "-") {{$3 = $3 - 5}} print $0}}\' | '
cmd += 'gzip -nc > {}'

ADD REPLY • link 3.4 years ago by Laura Z • 0

score 0 · Answer 2 · 2016-05-07

0

Entering edit mode

8.6 years ago

liu.huand • 0

Hi, I used awk in linux to trim the end of bed file data. $ awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 4, $3 + 4, $4, $5, $6; else print $1, $2 - 5, $3 - 5, $4, $5, $6}' input.bed >output.bed

ADD COMMENT • link 8.6 years ago by liu.huand • 0

score 0 · Answer 3 · 2018-05-22

To prepare macs2 BEDPE, I believe for paired end

bedtools bamtobed -i reads.bam -bedpe | perl -lane 'if ($F[0] ne $F[3]) {print STDERR "Warnings: read pairs mapped to different chroms: $_"; next;} if ($F[7] eq "+" and $F[8] eq "-") {$F[1]+=4;$F[5]-=5; print "$F[0]\t$F[1]\t$F[5]";}elsif($F[7] eq "-" and $F[8] eq "+"){$F[4]+=4;$F[2]-=5; print "$F[0]\t$F[4]\t$F[2]";}else {print STDERR "Warnings: invalid line: $_";}' | sort -k1,1 -k2,2n -k3,3n > reads.sorted.bed

*if pairs mapped to dierent chrom, it will report Warning

*reads(+) would +4

*reads(-) would -5

*if pairs mapped to the same strand, also report Warning

*coordinates -sorted

Please correct me if I am wrong