Question

File formats for ATACseq data

0

Entering edit mode

2.9 years ago

c.eskiw ▴ 10

Good day all,

I am trying to generate bedgraphs for some ATACseq data I have generated. I am using the Ubuntu 20.02 environment.

I have used trim_galore, Hisat2 and Masc2 (and MACS3) to map the data and peak call. All good here.

However, I am also trying to generate bedgraphs for visualization in the UCSC genome browser. BEDtools will not merge my replicates. Apparently I have issues with Hisat not generating files with e the chromosome in the first column position. I have tried bedtools and samtools to convert to 1) convert the files back to .bam files followed by indexing (generating .bai files that I have failed to utilize) and 2) realigning the data using STAR. I have run into nothing but issues. As follows are some of the error messages I have received:

(base) ceskiw@DESKTOP-4267CK1:~$ bamCoverage -p 7 -b /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam --skipNAs --normalizeUsing RPKM -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bedgraph -of bedgraph
[E::idx_find_and_load] Could not retrieve index file for '/mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam'
'/mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam' does not appear to have an index. You MUST index the file first!

samtools sort -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam

This did generate index .bai files.

bamCoverage -b /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai --skipNAs --normalizeUsing RPKM -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bedgraph --outFileFormat bedgraph
[E::hts_hopen] Failed to open file /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai
[E::hts_open_format] Failed to open file "/mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai" : Exec format error
The file '/mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai' does not exist

Any help is of course greatly appreciated.

fasta .fq fastq • 1.3k views

ADD COMMENT • link updated 2.9 years ago by GenoMax 147k • written 2.9 years ago by c.eskiw ▴ 10

GenoMax · Accepted Answer · 2022-01-13

1

Entering edit mode

2.9 years ago

ATpoint 85k

The -b option takes a BAM files, not the BAI file. If you provide the BAM the tool will automatically fetch the corresponding BAI file if it exists. Are you sure the BAI index exists? Usually that is created using samtools index using the sorted BAM file. Maybe the recent SAMtools versions do that automatically when using sort, I did not check that recently.

ADD COMMENT • link 2.9 years ago by ATpoint 85k

0

Entering edit mode

Maybe the recent SAMtools versions do that automatically when using sort, I did not check that recently.

Not unless you include --write-index in sort command AFAIK.

ADD REPLY • link 2.9 years ago by GenoMax 147k

0

Entering edit mode

Nothing seemed to work on the first attempt here. I have stepped back to realign the reads with STAR. This is also failing.

ceskiw@DESKTOP-4267CK1:~$ STAR --runThreadN 7 --runMode genomeGenerate --genomeDir /mnt/f/ReferenceGenomes_GTFs/mouse2022.fa --genomeFastaFiles /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R2_trimmed.fasta
        STAR --runThreadN 7 --runMode genomeGenerate --genomeDir /mnt/f/ReferenceGenomes_GTFs/mouse2022.fa --genomeFastaFiles /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R2_trimmed.fasta
        STAR version: 2.7.9a   compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jan 14 09:28:43 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /mnt/f/ReferenceGenomes_GTFs/mouse2022.fa/Log.out. Will keep ./Log.out

Jan 14 09:28:43 ... starting to generate Genome files

EXITING because of INPUT ERROR: the file format of the genomeFastaFile: /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta is not fasta: the first character is '@' (64), not '>'.
 Solution: check formatting of the fasta file. Make sure the file is uncompressed (unzipped).

Jan 14 09:28:43 ...... FATAL ERROR, exiting

I am starting to think that using Trim_galore is the issue with the organization of the output files? Any insight into if this is a possibility? I will switch to Trimmomatic to see if this makes a difference. I will also see if there is other output options for Trim_galore that may correct this.

Thanks in advance.

ADD REPLY • link updated 2.9 years ago by GenoMax 147k • written 2.9 years ago by c.eskiw ▴ 10

1

Entering edit mode

Probably your entire script has flaws, such as naming fastq files as fasta, e.g. /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta. Consider using an end-to-end pipeline such as nf-core/atacseq.

ADD REPLY • link 2.9 years ago by ATpoint 85k