How to define the open chromatin region in ATACseq? And how to filter the .bam file based on insert size?
1
1
Entering edit mode
2.4 years ago
Dan ▴ 180

Hello,

How to define the open chromatin region in ATACseq? The tutorial (https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html) saids insert size less than 100 bp is open chromatin region and insert size in (180, 240) is mono-nucleosome region, I am wondering what is the insert size in (100, 180) belong to? I know the 9th column of .bam file is the inset size, how to filter the .bam file based on insert size? I tried

samtools view $(FILE)"_mm10.bam" | awk '$9<100' | awk '$9>0' | samtools view -b > openregionbam/$(FILE)"_mm10.bam"

But get error

[E::sam_parse1] missing SAM header
[W::sam_read1] parse error at line 1
[main_samview] truncated file.

What should I do? Thanks a lot.

ATAC • 639 views
ADD COMMENT
3
Entering edit mode
2.4 years ago
ATpoint 85k

Try this:

It adds -h to the first samtools command to preserve the header (absence of header is the cause of your error), it preserves that header in the awk command (skipping lines starting with @) and it respects that alignments to the bottom DNA strand would have negative insert size so you have to filter for insert size being larger than 100 and smaller than -100.

LEN=100
samtools view -h in.bam | awk -v LEN=$LEN '{if ($9 <= LEN && $9 >= -(LEN) && $9 != 0 || $1 ~ /^@/) print $0}' | samtools view -o out.bam

Yes, 100bp is a cutoff I've seen people (and myself) using before but maybe there are (depending on your application) specialized tools that call NFRs mor reliably. NucleoATAC was one that can do it back afair it was quite a pita to use.

Pro-tip: You can replace awk by mawk to speed up the process a lot.

ADD COMMENT

Login before adding your answer.

Traffic: 1900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6