Change accession number to chromosome number in reference genome
0
0
Entering edit mode
23 months ago
Tsigabu • 0

I have a reference genome explained by accession number instead of chromosome number. Any clue how to convert it into a chromosome number in the Linux terminal? Any suggestion is kindly appreciated.

fasta chromosome • 1.1k views
ADD COMMENT
0
Entering edit mode

Could you provide an example for your question, including the input and desired output? This will make it easier to assist you.

ADD REPLY
0
Entering edit mode

Thanks for the reply. As displayed below, the fasta reference file is starting with >NC_, but what I'm looking for is I need to substitute the accession number by Chromosome number. For-example, the accession number NC_056070.1 stands for chromosome 17. instead of starting the first column with NC_056070.1 I need Chromosome 17. The fast file has 26 chromosomes, X chromosome and mitochondrial sequence as well.

kindly.

>NC_056070.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 17, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056071.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 18, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056072.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 19, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056073.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 20, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056074.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 21, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056075.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 22, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056076.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 23, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056077.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 24, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056078.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 25, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056079.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome 26, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
>NC_056080.1 Ovis aries strain OAR_USU_Benz2616 breed Rambouillet chromosome X, ARS-UI_Ramb_v2.0, whole genome shotgun sequence
ADD REPLY
0
Entering edit mode

If the above is a consistent pattern, edit the header so it has just the chromosome [^,]+ but as chromosome_[^,]+ like so:

sed -Ee '^>s/.+chromosome ([^,]+),.+/chromosome_\1/' in.fasta > renamed.fasta

Test before running on the whole file and double check headers once done.

ADD REPLY

Login before adding your answer.

Traffic: 1773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6