ATAC seq analysis
1
0
Entering edit mode
2.5 years ago
aranyak111 • 0

I have been trying to get my hands on ATAC sequencing data analysis using this tutorial designed by Rockfeller University.

https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html. I have been working in a linux cluster with lots of memory.

Following the initial code block I have been able to generate the genome index successfully.

library(Rsubread)

genome <- "ATAC_Data/ATAC_Reference/hg19_Genome.fa"
indexForSubread <- gsub("\\.fa$", "", genome)

buildindex(indexForSubread, genome, indexSplit = FALSE)

The index generation log file is is given below.

 Index name : hg19_Genome                                    ||
||               Index space : base space                                     ||
||               Index split : no-split                                       ||
||          Repeat threshold : 100 repeats                                    ||
||              Gapped index : no                                             ||
||                                                                            ||
||       Free / total memory : 927.4GB / 1007.8GB                             ||
||                                                                            ||
||               Input files : 1 file in total                                ||
||                             o hg19_Genome.fa                               ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Check the integrity of provided reference sequences ...                    ||
|| No format issues were found                                                ||
|| Scan uninformative subreads in reference sequences ...                     ||
|| 676060 uninformative subreads were found.                                  ||
|| These subreads were excluded from index building.                          ||
|| Estimate the index size...                                                 ||
||    8%,   1 mins elapsed, rate=2513.4k bps/s                                ||
||   16%,   3 mins elapsed, rate=2423.9k bps/s                                ||
||   24%,   5 mins elapsed, rate=2386.6k bps/s                                ||
||   33%,   7 mins elapsed, rate=2368.8k bps/s                                ||
||   41%,   9 mins elapsed, rate=2365.4k bps/s                                ||
||   49%,  10 mins elapsed, rate=2371.8k bps/s                                ||
||   58%,  12 mins elapsed, rate=2401.5k bps/s                                ||
||   66%,  14 mins elapsed, rate=2405.9k bps/s                                ||
||   74%,  15 mins elapsed, rate=2451.3k bps/s                                ||
||   83%,  17 mins elapsed, rate=2468.2k bps/s                                ||
||   91%,  18 mins elapsed, rate=2490.8k bps/s                                ||
|| 17.4 GB of memory is needed for index building.                            ||
|| Build the index...                                                         ||
||    8%,   3 mins elapsed, rate=1157.4k bps/s                                ||
||   16%,   7 mins elapsed, rate=1108.6k bps/s                                ||
||   24%,  11 mins elapsed, rate=1102.2k bps/s                                ||
||   33%,  15 mins elapsed, rate=1101.6k bps/s                                ||
||   41%,  19 mins elapsed, rate=1111.6k bps/s                                ||
||   49%,  22 mins elapsed, rate=1122.2k bps/s                                ||
||   58%,  26 mins elapsed, rate=1139.1k bps/s                                ||
||   66%,  30 mins elapsed, rate=1144.3k bps/s                                ||
||   74%,  33 mins elapsed, rate=1171.2k bps/s                                ||
||   83%,  36 mins elapsed, rate=1186.4k bps/s                                ||
||   91%,  39 mins elapsed, rate=1203.3k bps/s                                ||
|| Save current index block...                                                ||
||  [ 0.0% finished ]                                                         ||
||  [ 10.0% finished ]                                                        ||
||  [ 20.0% finished ]                                                        ||
||  [ 30.0% finished ]                                                        ||
||  [ 40.0% finished ]                                                        ||
||  [ 50.0% finished ]                                                        ||
||  [ 60.0% finished ]                                                        ||
||  [ 70.0% finished ]                                                        ||
||  [ 80.0% finished ]                                                        ||
||  [ 90.0% finished ]                                                        ||
||  [ 100.0% finished ]                                                       ||
||                                                                            ||
||                     Total running time: 89.4 minutes.                      ||
||Index /labs/markay/Aranyak/atacseq-rockfeller/hg19_Genome was successf ...

I tried to execute the next step following the next code block

read1 <- "ATAC_Data/ATAC_FQs/SRR891269_1.fastq.gz"
read2 <- "ATAC_Data/ATAC_FQs/SRR891269_2.fastq.gz"
outBAM <- "ATAC_50K_2.bam"

align(indexForSubread, readfile1 = read1, readfile2 = read2, output_file = outBAM, 
    nthreads = 2, type = 1, unique = TRUE, maxFragLength = 2000)

When I try to run the R code directly in the cluster it gives the following error

Out of memory. If you are using Rsubread in R, please save your working environment and restart R.

Error in .load.delete.summary(output_file[i]) :
  Summary file /labs/markay/Aranyak/atacseq-rockfeller/Sorted_ATAC_50K_2.bam.summary was not generated! The program terminated wrongly!

Hence to get rid of the memory issue I wrapped the second code block as an R script so that I can allocate enough memory and run it as a batch script in the cluster with the following command.

#!/bin/bash
#SBATCH --job-name=atacseq_align
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --partition=batch
#SBATCH --cpus-per-task=10
#SBATCH --time=6:00:00
#SBATCH --account=markay
#SBATCH --mem=500GB
#SBATCH --output=my_script_test_4.log

# load modules
ml R/4.1.2

# execute script
Rscript my_script.R

I get the following error

Error in .check_and_NormPath(index, mustWork = FALSE, opt = "index") : 
  object 'indexForSubread' not found
Calls: align -> .check_and_NormPath
Execution halted

I have also checked with the Rsubread vignette on their test data and that looks fine.

Any help with someone more experienced in ATAC seq will be greatly appreciated.

ATAC-Seq • 1.4k views
ADD COMMENT
0
Entering edit mode

I don't think this has anything to do with ATAC-Seq. Show us the content of your my_script.R file. In all probability, the error lies there.

ADD REPLY
0
Entering edit mode

The content of my_script.R file is the second code block already stated.

library(Rsubread)
read1 <- "/labs/markay/Aranyak/atacseq-rockfeller/ATACSample_r1.fastq"
read2 <- "/labs/markay/Aranyak/atacseq-rockfeller/ATACSample_r2.fastq"
outBAM <- "/labs/markay/Aranyak/atacseq-rockfeller/Sorted_ATAC_50K_2.bam"
align(indexForSubread, readfile1 = read1, readfile2 = read2, output_file = outBAM, 
    nthreads = 2, type = 1, unique = TRUE, maxFragLength = 2000)
ADD REPLY
1
Entering edit mode
2.5 years ago
Ram 44k

You have not assigned a value to indexForSubread. Add this lines based on your previous code to the script:

indexForSubread <- "ATAC_Data/ATAC_Reference/hg19_Genome"
ADD COMMENT
0
Entering edit mode

Thanks, the suggestion has solved the problem for now.

ADD REPLY
0
Entering edit mode

upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 1678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6