FASTA FILE to TABLE R/python
1
1
Entering edit mode
4.8 years ago
fagambaro3 ▴ 30

Hi all,

Does anyone knows who to convert a DNA sequence from a fasta file to a table of one column? Can be in R or python!

I tried this fasta to table converter but is not working for me.. https://rstudio-pubs-static.s3.amazonaws.com/518943_a6bb21f87f594e6fb2aaa9ca2ef79cc0.html

Then I also tried to convert my fasta file into a csv (using https://birdlet.github.io/2017/12/13/fasta2csv/ ) but is not working either becuse then I have multiples columns, not one as I need.

1 >DENV4_(consensus)
2 A G T T G T T A G T C T G T G T G G A C C G A C A A G G A C A G T T C C A A A 3 T T C T A A C A G T T T G T T T A G A T A G A G A G C A G A T C T C T G G A A

Can anyone help me?

Thanks a lot!

Fabiana

R fasta • 3.6k views
ADD COMMENT
2
Entering edit mode

If you linearize the fasta file then it should become what you are looking for. Try this code from @Pierre.

ADD REPLY
0
Entering edit mode

`Hey! Thanks for the help!!

So, I first linearized my fasta as you suggested:

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < ipc214_S8_DENV4_consensus.fa

Then i converted my fasta into csv:

fasta2csv.py ipc214_S8_DENV4_linearized.fasta ipc214_S8_linearized.csv

And then in R i try to open my csv file:

read.csv(file = 'ipc214_S8_linearized.csv', header = FALSE, sep = ",", quote = "\"",
     dec = ".", fill = TRUE)

And I get the following:

1 >DENV4_(consensus) AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAATCGGAAGCTTGCTTAACACAGTTCTAACAGTTTGTTTAGATAGAGAGCAGATCTCTGGAAAAATGAACCAACGAAAGAAGGTGGCTAGACCACCTTTCAATATGCTGAAACGCGAGAGAAACCGCGTATCAACCCCTCAAGGGTTGGTGAAGAGATTCTCGACTGGACTTTTTTCCGGGAAAGGACCCTTACGGATGATGTTGGCATTCATTACGTTTTTGAGAGTTCTTTCCATCCCACCAACAGCAGGGATTCTAAAAAGATGGGGACAGTTAAAGAAAAACAAGGCCGTGAAG.. <truncated>

Which is not exactly what I need. I want to have a table like this:

1 A

2 G

3 T

4 T

etc..

Maybe my approach is not the best! What do you think?

Thanks a lot again!

ADD REPLY
1
Entering edit mode

Here are some other options to linearize fasta: Linearize fasta files

ADD REPLY
1
Entering edit mode
4.8 years ago
gayachit ▴ 200

You could try this simple code in Python 3

import csv
dna_seq="AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAA3TTCTAACAGTTTGTTTAGAT"

g = list(enumerate(dna_seq, 1))
with open("letter.csv", "w") as f:
    writer = csv.writer(f)
    for row in g:
        writer.writerow(row)
f.close()

This will generate a letter.csv file

ADD COMMENT

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6