Hello All,
I have multiple fasta files. I want to make output files for each chromosome from all the fasta files. For example, output file all_ch1.fasta
will have ch1
sequences from all the fasta files and so on. I tried:
samtools faidx *fasta.gz ch1 > all_ch1.fasta
But I am getting this error:
[W::fai_get_val] Reference sample2.fasta.gz not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in sample2.fasta.gz
I checked sample2.fasta.gz file but it is not empty. Thank you for any help!
if this format is correct,
ch1 is sample1_rhg1.0ch1 in sample1.fasta.gz and ch1 is sample2_rhg1.0ch1 in sample2.fasta.gz
, try:with seqkit (dry-run)
New files would be in
stdin.split
directory. File names would bestdin.id_ch1.fasta
,stdin.id_ch2.fasta
for eachch
and each fasta sequence name will be exactly as it is in eachch
fasta. For eg.>sample1_rhg1.0ch1
and>sample2_rhg1.0ch1
forch1
. Download seqkit from https://bioinf.shenwei.me/seqkit/download/. Removed
from-di
once you are okay with dry-run output.without seqkit (assuming that sequences are flattened and all files have equal number
ch
entities)Files would be named
ch1.fasta
,ch2.fasta
etc.