im newbie to NGS and linux. can anyone kindly let me know the easiest way to split the fasta file containing chromosome and scaffold into individual files i.e files containing one chromosome per file... thanks
im newbie to NGS and linux. can anyone kindly let me know the easiest way to split the fasta file containing chromosome and scaffold into individual files i.e files containing one chromosome per file... thanks
A modification of Pierre's answer from here : Splitting A Fasta File
awk '/^>/ {F=substr($0, 2, length($0))".fasta"; print >F;next;} {print >> F;}' < ref_genome.fasta
Just replace ref_genome.fasta
with your file
I think you could have found this answer somewhere in Biostars, like this one :
How to split fasta into seperate files by chromosome (in the header)
If I understand correctly your goal :
from Bio import SeqIO
for record in SeqIO.parse('ref_genome.fasta', 'fasta') :
with open( record.id+".fasta", "a") as output_handle :
SeqIO.write(record, output_handle, 'fasta')
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How do you know which scaffold belongs to which chromosome ? If you don't, you'll have to align all your scaffolds on a reference genome. If you have the information in your scaffold headers, you can split your file with a python script for example.
sorry it was about Fasta file containing genome sequence and not fastq. i need each chromosome to be in single seperate file rather to be all the genome in one fasta file. any help plz???
Aah, I got it, you have a reference genome in fasta and you want to split each sequence to a separated file, isn't it ?
Please follow up on your previous threads and mark answers as accepted when appropriate.
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.