Hi trying to run an ATAC-seq pipeline and getting the following error for the above code, does anyone know if older versions of faidx had the -x flag? Thanks.
Original code:
## extract fasta per chromosome
cd ${DATA_DIR}/$GENOME
mkdir -p seq
cd seq
rm -f ${REF_FA_PREFIX}
ln -s ../${REF_FA_PREFIX} ${REF_FA_PREFIX}
samtools faidx -x ${REF_FA_PREFIX}
cp --remove-destination *.fai ../
2017-08-03 11:52:54 (167 MB/s) - “mm10_dnase_avg_fseq_signal_metadata.txt” saved [1251/1251]
## extract fasta per chromosome
cd ${DATA_DIR}/$GENOME
mkdir -p seq
cd seq
rm -f ${REF_FA_PREFIX}
ln -s ../${REF_FA_PREFIX} ${REF_FA_PREFIX}
faidx -x ${REF_FA_PREFIX}
cp --remove-destination *.fai ../
The faidx command is included as part of the pyfaidx python module, and indeed has an -x flag, which splits the input fasta into individual fasta files. That's what "extract fasta per chromosome" means in the comment.
You can install pyfaidx using pip: pip install pyfaidx
Now the install_genome_data.sh has passed the error step after elimination of -x, was just confusing because the first time I ran a different version a few weeks back it had the -x and didn't stall.
Instead of using samtools to create index file, you can use IGV; under Tools -> Run igvtools... then select Index and your genome fasta file or BAM file as the input. This will give you the .fai or .bai index file.
Right, just not sure why the original code had the -x, was trying to figure that out. Thanks.
Where is the original from?
https://github.com/kundajelab/atac_dnase_pipelines/blob/master/install_genome_data.sh
Line 185.
I also had to add samtools preceding faidx but thats just our samtools I guess.
No, it's a separate script called
faidx
from thepyfaidx
package.Now the install_genome_data.sh has passed the error step after elimination of -x, was just confusing because the first time I ran a different version a few weeks back it had the -x and didn't stall.