Hi,
I have 10 fasta files (each file with 20 gene sequences from each of the 10 samples). I would like to create 20 files, specific to each gene from 10 samples. I proceeded as follows to extract genes with the file_name in header:
pyfasta extract --header --fasta test.fasta gene_name1 | awk '/^>/ {$0=$0 "_sample1"}1' > gene_name1.fasta
Output:
>gene_name1_sample1
ATGC
I am successful in creating multiple gene fasta files for each gene from each sample (a part from loop):
pyfasta extract --header --fasta $sample.fasta gene_name1 >> gene_name1.fasta
pyfasta extract --header --fasta $sample.fasta gene_name2 >> gene_name2.fasta
But, I am unable to add file_name to the header of files in loop (but can do for 1 file as mentioned in the beginning).
Kindly guide.
Thanks.
Can you post an example of some of the headers from one of the sample fasta? Is each sample fasta a multi-line or single line fasta? Is each header, per gene, identical across sample fastas?
Header from sample1.fasta file
Like this upto 20 gene names and their sequences. Each sample fasta files are multi-line. Gene name is same across sample fastas.