fastq file split in chromosome
2
1
Entering edit mode
6.6 years ago

im newbie to NGS and linux. can anyone kindly let me know the easiest way to split the fasta file containing chromosome and scaffold into individual files i.e files containing one chromosome per file... thanks

next-gen • 3.2k views
ADD COMMENT
1
Entering edit mode

How do you know which scaffold belongs to which chromosome ? If you don't, you'll have to align all your scaffolds on a reference genome. If you have the information in your scaffold headers, you can split your file with a python script for example.

ADD REPLY
0
Entering edit mode

sorry it was about Fasta file containing genome sequence and not fastq. i need each chromosome to be in single seperate file rather to be all the genome in one fasta file. any help plz???

ADD REPLY
0
Entering edit mode

Aah, I got it, you have a reference genome in fasta and you want to split each sequence to a separated file, isn't it ?

ADD REPLY
0
Entering edit mode

Please follow up on your previous threads and mark answers as accepted when appropriate.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY
6
Entering edit mode
6.6 years ago

A modification of Pierre's answer from here : Splitting A Fasta File

awk '/^>/ {F=substr($0, 2, length($0))".fasta"; print >F;next;} {print >> F;}' < ref_genome.fasta

Just replace ref_genome.fasta with your file

ADD COMMENT
1
Entering edit mode
6.6 years ago

I think you could have found this answer somewhere in Biostars, like this one :

How to split fasta into seperate files by chromosome (in the header)

If I understand correctly your goal :

from Bio import SeqIO

for record in SeqIO.parse('ref_genome.fasta', 'fasta') :
    with open( record.id+".fasta", "a") as output_handle :
        SeqIO.write(record, output_handle, 'fasta')
ADD COMMENT
0
Entering edit mode

yes you ve correctly guessed the requirement. thanks for the answer. but is there any other one iiner command like csplit awk ets. I am newbie to linux and could not understand the above command

ADD REPLY
1
Entering edit mode

It's not a unix command, it's python code.

ADD REPLY

Login before adding your answer.

Traffic: 1133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6