Question

how to use list of gene id to get cds sequence(cds fasta file have many annotation, only gene id: is same to query id)

0

Entering edit mode

5.2 years ago

LZH289 • 0

hello, i have a question when i want to extract cds sequence using gene id. but cds file is not just start with >gene is, it has many other annotation. the only same is star with gene:

cds fasta:

>Zm002 cds gene:Zm1d035916 gene_biotype:protein_coding 
ATCGGCAT
>Zm001 cds RefGen_v4:9:153880862:153883850:-1 gene:Zm1d048 gene_biotype:protein_coding
ATGCGGCA

gene_list

Zm1d035916
Zm1d048

how to get result like

>Zm1d035916
ATCGGCAT
>Zm1d048 
ATGCGGCA

sequence gene • 2.0k views

ADD COMMENT • link 5.2 years ago by LZH289 • 0

0

Entering edit mode

I think biopython might help : http://biopython.org/DIST/docs/tutorial/Tutorial.html Refer to section 2.4.1 :

from Bio import SeqIO

for seq_record in SeqIO.parse("file_name.fasta", "fasta"):
    printseq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))

You should get something like this on your screen:

gi|2765658|emb|Z78533.1|CIZ78533
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', SingleLetterAlphabet())
740

I think it is very straightforward, once you get the seq_record.id then you can slice the specific substring from the seq_record.id by using str.find (for example) in python

ADD REPLY • link 5.2 years ago by manaswwm ▴ 570

score 2 · Accepted Answer · 2020-04-22

2

Entering edit mode

5.2 years ago

JC 13k

Nothing that a Perl script can do:

#!/usr/bin/perl

use strict;
use warnings;

$ARGV[2] or die "use getSeqs.pl <File with IDs> <Input Fasta> <Output Fasta>\n";
my $list_file = shift @ARGV;
my $fasta_in = shift @ARGV;
my $fasta_out = shift @ARGV;

my %sel;
open (my $lh, "<", $list_file) or die;
while (<$lh>) {
    chomp;
    $sel{$_}++;
}
close $lh;

$/ = "\n>";
open (my $ih, "<", $fasta_in) or die;
open (my $oh, ">", $fasta_out) or die;
while (<$ih>) {
    s/>//g;
    my ($id_line, @seq) = split (/\n/, $_);
    if ($id_line =~ /gene:(\w+)/) {
        my $id = $1;
        if (defined $sel{$id}) {
            print $oh ">$id\n";
            print $oh join "\n", @seq;
            print $oh "\n";
        }
    }
}
close $ih;
close $oh;

ADD COMMENT • link 5.2 years ago by JC 13k

0

Entering edit mode

thank you for your reply. when I use perl in terminal, and run perl cds_sequence.pl gene_list.txt cds.fa cds_extract.fa it shows error like this: Can't use global $/ in "my" at cds_sequence.pl line 19, near "my $/ " Execution of cds_sequence.pl aborted due to compilation errors. how to solve this problem?

ADD REPLY • link 5.2 years ago by LZH289 • 0

0

Entering edit mode

removing the "my" before the "$/", I edited the code

ADD REPLY • link 5.2 years ago by JC 13k

0

Entering edit mode

it's work. Thank you!

ADD REPLY • link 5.2 years ago by LZH289 • 0

0

Entering edit mode

can u please tell me how to run this code, means where I should mention my files in this code, I am new in programming.

ADD REPLY • link 5.2 years ago by p • 0