How could I get one chromosome per sequence file since I already have the whole genome file
2
1
Entering edit mode
9.3 years ago

Thank you guys, lately I found myself lost in how to use the Mapsplice tools.

Here is the need of MapSplice:

The directory containing the sequence files of reference genome. All sequence files are required to:

  • In "FASTA" format, with '.fa' extension.
  • One chromosome per sequence file.
  • Chromosome name in the header line ('>' not included) is the same as the sequence file base name, and does not contain any blank space.
  • E.g. If the header line is '>chr1', then the sequence file name should be 'chr1.fa'.

I only have the whole Genome file which is human.fa, how can I get the seperately files? Thank you!

genome sequence • 5.4k views
ADD COMMENT
5
Entering edit mode
9.3 years ago

You can do this with pyfaidx:

faidx --split-files multifasta.fa

The defaults will create one file per sequence, with the file names derived from the sequence names. If there are no spaces in the sequence names there will be none in the file names. Special characters are replaced with ".", but you could modify the sequence IDs/file names with the --regex / --delimiter flags.

ADD COMMENT
2
Entering edit mode
9.3 years ago

You can get the fasta sequence for each chromosome (GRCh37) here.

ADD COMMENT
0
Entering edit mode

I did it using perl, but still thank you! Here are the details

#!/usr/bin/perl

    $f = $ARGV[0]; #get the file name

    open (INFILE, "<$f")
    or die "Can't open: $f $!";

    while (<INFILE>) {
    $line = $_;
    chomp $line;
    if ($line =~ /\>/) { #if has fasta >
        close OUTFILE;
        $new_file = substr($line,1);
        $new_file .= ".fa";
        open (OUTFILE, ">$new_file")
        or die "Can't open: $new_file $!";
    }
    print OUTFILE "$line\n";
    }
    close OUTFILE;
ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6