how to construct CDS by using annotation file (coordinates) properly ???
0
0
Entering edit mode
7.2 years ago

Assalam o aliakum everyone,

I have a BAM file of dog genome and I have generated consensus FASTA from it. BAM is aligned against Canfam3.1 so I have used annotation file (gff3)of Canfam3.1 from NCBI for extracting CDS from consensus FASTA. Firstly I have fetched Coordinates of my gene.

Coordinates sample of single CDS:

NC_006611.3 Gnomon  CDS 28363101    28363137    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1
NC_006611.3 Gnomon  CDS 28491275    28491447    .   +   2   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1
NC_006611.3 Gnomon  CDS 28491806    28491907    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1
NC_006611.3 Gnomon  CDS 28492441    28492494    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

I have used above coordinates and fetched corresponding sequence from consensus FASTA.

Sequence Sample of single CDS:

>chr29:28363101-28363137
TGACTGTGTCAGTCCAGGTTCTCTGGGGGACTGAGG
>chr29:28491275-28491447
AGTGGGAATGGCTCTGCGAAAGGTGGGTGCAATGGCCAAACCAGATTGTATCATCTCTTCTGACGGCAAAAACCTCACCATAAAAACTGAGAGCACTTTGAAAACAACACAGTTTTCGTGTAATCTGGGAGAGAAGTTTGAAGAAACTACAGCTGATGGCAGAAAAACTCAG
>chr29:28491806-28491907
CTGTCTGCAACTTCACAGACGGCGCATTGGTTCAACATCAGGAATGGGATGGGAAGGAAAGCACAATAACAAGAAAGTTGGAAGATGGGAAATTGGTGGTG
>chr29:28492441-28492494
AATGCGTCATGAACAATGTCACCTGTACGCGGATCTATGAAAAAGTAGAGTAA

I will Concatenate these parts of the CDS further but As u can see in the above example base A of the start codon (ATG) is missing. How can I fix it?

Now I have multiple questions (I'm not getting that where is problem actually)

Is it happened due to 0-based/1-based coordinate system?

Should I add add one base (off-by-one) at the start of each starting coordinate? (Actually I checked it for first coordinate only I have reduced start coordinate by 1 and it always give base A)

Should I reduce start coordinate by one for each part of the CDS?

How can I check that my bam file is 0-based or 1-based?

CDS coordinate-system BAM • 2.4k views
ADD COMMENT
1
Entering edit mode

I have used above coordinates and fetched corresponding sequence from consensus FASTA.

how ? how did you get the fasta sequences ?

bam file is 0-based or 1-based ???

a bam file is internally 0-Based

a sam file is always 1-based.

ADD REPLY
0
Entering edit mode

Sorry for this late reply !

i have fetched column 4 and 5 from gff3 (annotation) file and made a bed6 file then i have used bedtools getfasta for getting FASTA sequence.

I have downloaded bam file and then generated consensus FASTA from bam file by using samtools. what is the format of my fasta file now ????

ADD REPLY

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6