Shifting reads for ATAC-seq alignments
3
4
Entering edit mode
8.7 years ago
robm9119 ▴ 180

Hi everyone,

In the original paper for ATAC-seq, the authors shifted the reads +4 bp for the +strand and -5 bp for the -strand: http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2688.html

How can I easily shift paired-end alignments for my ATAC-seq samples as done in the paper?

Thanks!

next-gen alignment • 13k views
ADD COMMENT
5
Entering edit mode
8.7 years ago
James Ashmore ★ 3.5k

The bedtools command should extract the paired-end alignments as bedpe format, then the awk command should shift the fragments as needed:

bedtools bamtobed -i reads.bam -bedpe | awk -v OFS="\t" '{if($9=="+"){print $1,$2+4,$6+4}else if($9=="-"){print $1,$2-5,$6-5}}' > fragments.bed

Note, the BAM file should be sorted by read name beforehand:

samtools sort -n -T aln.sorted -o aln.sorted.bam aln.bam
ADD COMMENT
0
Entering edit mode

Thank you for your reply. Is there a way I could obtain shifted reads in BAM format?

ADD REPLY
1
Entering edit mode

You could convert the BED file back to a BAM file using bedtools bedtobam command

ADD REPLY
0
Entering edit mode

Hi, besides pos/start/end, if I also want to extract the sequence inforamtion about the fragment, what should I do?

ADD REPLY
0
Entering edit mode

You can use bedtool's getfasta command:

bedtools bamtobed -i input.bam | bedtools getfasta -fi genome.fasta -bed stdin > fragments.fasta
ADD REPLY
0
Entering edit mode

Hello James,

If I understand your awk command correctly, $2 represents 5' end of R1 (/left-most) read and $6 represents 5' end of the R2 (/right-most) read. If that is correct, then my question is, why do we add 4 to both ends? Shouldn't it be $2+4 and $6-5?

Thanks.

ADD REPLY
0
Entering edit mode

Hi Ravi, I have the same confusion as yours. Have you understood it? On the first ATAC paper published in 2013, they said "For peak-calling and footprinting, we adjusted the read start sites to represent the center of the transposon binding event." And the ENCODE atac papline showed they adjust one end of the fragment.So I'm mostly in favour of this idea, what do you think?

enter code here

cmd = 'zcat -f {} | '
cmd += 'awk \'BEGIN {{OFS = "\\t"}}'
cmd += '{{ if ($6 == "+") {{$2 = $2 + 4}} '
cmd += 'else if ($6 == "-") {{$3 = $3 - 5}} print $0}}\' | '
cmd += 'gzip -nc > {}'
ADD REPLY
0
Entering edit mode
8.7 years ago
liu.huand • 0

Hi, I used awk in linux to trim the end of bed file data. $ awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 4, $3 + 4, $4, $5, $6; else print $1, $2 - 5, $3 - 5, $4, $5, $6}' input.bed >output.bed

ADD COMMENT
0
Entering edit mode
6.6 years ago

To prepare macs2 BEDPE, I believe for paired end

bedtools bamtobed -i reads.bam -bedpe | perl -lane 'if ($F[0] ne $F[3]) {print STDERR "Warnings: read pairs mapped to different chroms: $_"; next;} if ($F[7] eq "+" and $F[8] eq "-") {$F[1]+=4;$F[5]-=5; print "$F[0]\t$F[1]\t$F[5]";}elsif($F[7] eq "-" and $F[8] eq "+"){$F[4]+=4;$F[2]-=5; print "$F[0]\t$F[4]\t$F[2]";}else {print STDERR "Warnings: invalid line: $_";}' | sort -k1,1 -k2,2n -k3,3n > reads.sorted.bed

*if pairs mapped to dierent chrom, it will report Warning

*reads(+) would +4

*reads(-) would -5

*if pairs mapped to the same strand, also report Warning

*coordinates -sorted

Please correct me if I am wrong

ADD COMMENT

Login before adding your answer.

Traffic: 946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6