Question

Homer makeTagDirectory does not generate the tagInfo.txt file

0

Entering edit mode

2.7 years ago

Diana G. ▴ 30

Hi!

I am trying to use Homer to analyze my HiChIP data, but I am facing troubles at the very beginning while generating the Tag Directories. I made my Tad Directory based in the next guidelines http://homer.ucsd.edu/homer/interactions/HiCtagDirectory.html

I used the next command:

$PH/makeTagDirectory $outdir/Rubio_1_out /media/drogel/Elements/hicpro/Output1to4/bowtie_results/bwt2/Rubio_HiC_2/Rubio_HiC_2_R1_merged_hg38.bwt2merged.bam,/media/drogel/Elements/hicpro/Output1to4/bowtie_results/bwt2/Rubio_HiC_2/Rubio_HiC_2_R2_merged_hg38.bwt2merged.bam -tbp 1 -genome $genomeID -checkGC -restrictionSite GATC

where PH=path/to/homer/ outdir=path/to/output/directory

But when the directory is generated, there are only tsv files, for example:

chrY.tags.tsv chrY.tags.tsv.R1 chrY.tags.tsv.R2

And then I am not sure what I am doing wrong. The next analysis cannot proceed because of the first error that pops out is:

Using existing Tag Directory (../Rubio_1_out/) !!! Could not open tag information file: ../Rubio_1_out//tagInfo.txt Probably not a valid tag directory. Quitting...

Do you have any Idea how can I solve this?

tagInfo.txt Homer HiC Paired data • 2.8k views

ADD COMMENT • link updated 2.6 years ago by GenoMax 147k • written 2.7 years ago by Diana G. ▴ 30

0

Entering edit mode

I have used Hisat2 to align my reads for HiC as single ends. I have been able to get tag directories but the second run using:

makeTagDirectory /mnt/f/SlipC_April26_22/fastq/TagDirectoryPro1/ HiC-noSelfLigation/ -update -genome hg19 -removePEbg -restrictionSite GATC -both -removeSelfLigation -removeSpikes 10000 5

is giving me the following errors:

Couldn't open a sequence file for "1" (/home/ceskiw/.//data/genomes/hg19//genome.fa.masked)

Is this the issue referenced above with aligning using Hisat2 vs Bowtie2?

All the best, Chris

ADD REPLY • link 2.6 years ago by c.eskiw ▴ 10

score 2 · Accepted Answer · 2022-03-23

2

Entering edit mode

2.7 years ago

Trivas ★ 1.8k

You need to use .sam files to create your tag directories. Under options:

-format <X> where X can be: (with column specifications underneath)

sam - SAM formatted files (use samTools to covert BAMs into SAM if you have BAM)

I'd also be careful adding the terminal / when calling your tag directory in future commands. I feel like I remember Homer being finicky with that.

ADD COMMENT • link 2.7 years ago by Trivas ★ 1.8k

1

Entering edit mode

Hi! thanks for your reply. I used it now with sam files and I keep getting the same results.

I got a message of the possible source of the error this time:

    makeTagDirectory ./Rubio_1_out path/to/file/Rubio_HiC_1_R1_merged_hg38.bwt2merged.sam,path/to/file/Rubio_1/Rubio_HiC_1_R2_merged_hg38.bwt2merged.sam
-tbp 1 -genome hg38 -checkGC -restrictionSite GATC
        Making paired end tag directory
        Will parse file: path/to/file/Rubio_HiC_1_R1_merged_hg38.bwt2merged.sam,path/to/file/Rubio_HiC_1_R2_merged_hg38.bwt2merged.sam
        Restriction site set to GATC

        Creating directory: ./Rubio_1_out and removing existing *.tags.tsv

        Reading paired end alignment files path/to/file/Rubio_HiC_1_R1_merged_hg38.bwt2merged.sam,path/to/file/Rubio_HiC_1_R2_merged_hg38.bwt2merged.sam
        Guessing that your alignment file is SAM format
    !!!!! Could not open file ./Rubio_1_out/chr5_KI270793v1_alt.tags.tsv for printing tags!!!!!
    !!!!! Is this a valid file name?  May need to:
        Try a different name for the tag directory (is the same name as an existing file?)
        -or- you may need to rename your chromosomes if they have weird characters!
        -or- try using the "-single" flag.

I did a "head" of my sam file and I found this

@HD VN:1.0  SO:queryname
@SQ SN:chr1 LN:248956422
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr11_KI270721v1_random  LN:100316
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr14_GL000009v2_random  LN:201709
@SQ SN:chr14_GL000225v1_random  LN:211173

then the problem are the extra characters after the number of the chromosome. Does someone know how to fix this?

Thanks.

ADD REPLY • link 2.7 years ago by Diana G. ▴ 30

2

Entering edit mode

I think there are two things happening here. The first is that Homer explicitly says to not align your reads to the genome in paired-end mode for HiC.

Unlike paired end sequencing for other techniques like genomic resequencing or RNA-Seq, where you might expect the 2nd read to be located within the general vicinity of the first read, read-pairs from Hi-C should be processed independently (e.g. do not use Tophat, bowtie, bwa or any short read alignment software in "paired-end" mode - each read should be mapped independently!).

The second is why you're having this current issue. Found here: http://homer.ucsd.edu/homer/interactions2/HiCtagDirectory.html

After trimming the FASTQ files, the next step is to align them to the reference genome. It is highly recommended to align to a reference genome composed of only canonical chromosomes - do not include *_random.fa scaffolds or other chromosomal fragments since they have not been properly assembled and will only introduce noise into the Hi-C analysis.

Personally, I would realign to a different reference genome that only includes the standard chromosomes so that you have that file for future use. But, here is a quick way to edit your sam file to do what you want: Remove mitochondrial reads from BAM files

ADD REPLY • link 2.7 years ago by Trivas ★ 1.8k

0

Entering edit mode

You were right. Best solution is to align again.

I tried also using the files from the first alignment and removing any non canonical chromosome and it was not so different, but definitely the result of the new alignment looks better. (Takes time, though).

Thanks!

ADD REPLY • link 2.7 years ago by Diana G. ▴ 30