Question

genome features and sequence parsing

0

Entering edit mode

7.7 years ago

lessismore ★ 1.4k

Hello people, need to parse the cds fraction of a genome based on a gff3 file and a genome file. Do you know any good parser for that? For the moment i am with:

cat mygenome.gff3 | awk -v FS="\t" -v OFS="\t" '$3 == "CDS" {print $1, $4-1 ,$5, $1":"$4"-"$5":"$9}' | bedtools getfasta -name -fi mygenome.fasta -bed - -fo cds.fa

please note: in this annotation exons are identified as cds1/cds2/cds3 etc..

At this point i just got all cds for each transcript. But my aim is: for each transcript parse the complete CDS sequence after joining all the cds1 cds2 cds3 etc.. also based on strand orientation.

In summary i want a table like this: chrom | coord. cds | seq (CDS)

do you have any clues for me? thanks

genome sequence cds parsing • 1.7k views

ADD COMMENT • link updated 7.7 years ago by Matteo Schiavinato ★ 3.6k • written 7.7 years ago by lessismore ★ 1.4k

1

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

0

Entering edit mode

thanks, can you also tell me how you printed the pic of this page that you just posted?

ADD REPLY • link 7.7 years ago by lessismore ★ 1.4k

1

Entering edit mode

Directly next to the button for code markup is a button for inserting images. You need to put the picture online somewhere, I use tinypic but there are many alternatives.

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

score 0 · Answer 1 · 2017-04-24

0

Entering edit mode

7.7 years ago

Matteo Schiavinato ★ 3.6k

You're looking for getAnnoFasta.pl from the Augustus programs:

http://bioinf.uni-greifswald.de/augustus/binaries/scripts/getAnnoFasta.pl

ADD COMMENT • link 7.7 years ago by Matteo Schiavinato ★ 3.6k