Good day all,
I am trying to generate bedgraphs for some ATACseq data I have generated. I am using the Ubuntu 20.02 environment.
I have used trim_galore, Hisat2 and Masc2 (and MACS3) to map the data and peak call. All good here.
However, I am also trying to generate bedgraphs for visualization in the UCSC genome browser. BEDtools will not merge my replicates. Apparently I have issues with Hisat not generating files with e the chromosome in the first column position. I have tried bedtools and samtools to convert to 1) convert the files back to .bam files followed by indexing (generating .bai files that I have failed to utilize) and 2) realigning the data using STAR. I have run into nothing but issues. As follows are some of the error messages I have received:
(base) ceskiw@DESKTOP-4267CK1:~$ bamCoverage -p 7 -b /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam --skipNAs --normalizeUsing RPKM -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bedgraph -of bedgraph
[E::idx_find_and_load] Could not retrieve index file for '/mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam'
'/mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam' does not appear to have an index. You MUST index the file first!
samtools sort -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam
This did generate index .bai files.
bamCoverage -b /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai --skipNAs --normalizeUsing RPKM -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bedgraph --outFileFormat bedgraph
[E::hts_hopen] Failed to open file /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai
[E::hts_open_format] Failed to open file "/mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai" : Exec format error
The file '/mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai' does not exist
Any help is of course greatly appreciated.
Not unless you include
--write-index
insort
command AFAIK.Nothing seemed to work on the first attempt here. I have stepped back to realign the reads with STAR. This is also failing.
I am starting to think that using Trim_galore is the issue with the organization of the output files? Any insight into if this is a possibility? I will switch to Trimmomatic to see if this makes a difference. I will also see if there is other output options for Trim_galore that may correct this.
Thanks in advance.
Probably your entire script has flaws, such as naming fastq files as fasta, e.g.
/mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta
. Consider using an end-to-end pipeline such as nf-core/atacseq.