Hi,
I would like to create a new fasta file from the original genome fasta and a vcf file. The fasta file will only have full gene sequences included.
I can use the gatk FastaAlternateReferenceMaker to accomplish this:
java -jar -Xmx16g ~/bin/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R ref_genome.fasta -o sample_SNV.fasta -V sample_SNV_selected.vcf -L ref_gene.bed
But I would like the output fasta to have the gene names as the header. For instance the current fasta output from gatk is:
>1 chr01:2350
AGAAAGGACAGAAAAAAAGATGGTGAAGTAGAAAGAGGGCGAAATGAAAAAAGGGAAAGC
AAAAGAGATGATGAAAGTCATAGAGAGAGAGATGAAAAAAGGGAAAGCAAAAGAGATGAT
I would like the output to 1) not have a sequential numerical output and 2) to contain the gene name from column 4 of the .bed file.
Is there a way to either modify 1) the input bed file or 2) the output fasta file by giving 'some tool' the fasta and the bed file?
Thanks!
There are many threads related to renaming fasta file headers on biostars. Here are a couple but search for others
renaming all fasta headers in a file
replace fasta headers with another name in a text file