Question

fastq file split in chromosome

1

Entering edit mode

7.3 years ago

blooming.daisy333 ▴ 110

im newbie to NGS and linux. can anyone kindly let me know the easiest way to split the fasta file containing chromosome and scaffold into individual files i.e files containing one chromosome per file... thanks

next-gen • 3.6k views

ADD COMMENT • link updated 7.3 years ago by Bastien Hervé 6.4k • written 7.3 years ago by blooming.daisy333 ▴ 110

1

Entering edit mode

How do you know which scaffold belongs to which chromosome ? If you don't, you'll have to align all your scaffolds on a reference genome. If you have the information in your scaffold headers, you can split your file with a python script for example.

ADD REPLY • link 7.3 years ago by Bastien Hervé 6.4k

0

Entering edit mode

sorry it was about Fasta file containing genome sequence and not fastq. i need each chromosome to be in single seperate file rather to be all the genome in one fasta file. any help plz???

ADD REPLY • link 7.3 years ago by blooming.daisy333 ▴ 110

0

Entering edit mode

Aah, I got it, you have a reference genome in fasta and you want to split each sequence to a separated file, isn't it ?

ADD REPLY • link 7.3 years ago by Bastien Hervé 6.4k

0

Entering edit mode

Please follow up on your previous threads and mark answers as accepted when appropriate.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY • link 7.3 years ago by WouterDeCoster 48k

1

Entering edit mode

7.3 years ago

Bastien Hervé 6.4k

I think you could have found this answer somewhere in Biostars, like this one :

How to split fasta into seperate files by chromosome (in the header)

If I understand correctly your goal :

from Bio import SeqIO

for record in SeqIO.parse('ref_genome.fasta', 'fasta') :
    with open( record.id+".fasta", "a") as output_handle :
        SeqIO.write(record, output_handle, 'fasta')

ADD COMMENT • link 7.3 years ago by Bastien Hervé 6.4k

0

Entering edit mode

yes you ve correctly guessed the requirement. thanks for the answer. but is there any other one iiner command like csplit awk ets. I am newbie to linux and could not understand the above command

ADD REPLY • link 7.3 years ago by blooming.daisy333 ▴ 110

1

Entering edit mode

It's not a unix command, it's python code.

ADD REPLY • link 7.3 years ago by WouterDeCoster 48k

score 6 · Accepted Answer · 2018-04-16

6

Entering edit mode

7.3 years ago

Bastien Hervé 6.4k

A modification of Pierre's answer from here : Splitting A Fasta File

awk '/^>/ {F=substr($0, 2, length($0))".fasta"; print >F;next;} {print >> F;}' < ref_genome.fasta

Just replace ref_genome.fasta with your file

ADD COMMENT • link 7.3 years ago by Bastien Hervé 6.4k