Using Ruby To Convert Csv File To Fasta
2
2
Entering edit mode
13.3 years ago
User 7433 ▴ 170

Okay so I posted on here earlier about converting my excel file full of DNA sequences into a FASTA file for analysis with DNAsp.

I currently have columns containing 1)chromosome number 2) the DNA sequence..and I want to get

>chromosomenumber
AGTAGAGATAGAGAGA....
>chromosome number
AGTCGCTCGAGAGTC...

so..I got a couple of responses which basically told me off for using excel and not exploring other options to do this!

I have now downloaded Ruby (?!) and I am trying to get to grips with it using the tutorial. I am aware of a script to convert CSV files to FASTA...eg see link below

http://biorelated.com/2011/01/26/converting-sequence-data-from-csv-to-fasta-format/#comment-328

As I am a newbie to Ruby I am confused as to where I put my file name in this script, and how I get Ruby to find my file!

Can any kind person out there please look at this script and perhaps highlight the bits that I need to edit in order to get it to work for me?

Please!

Thanks very much x

fasta • 8.9k views
ADD COMMENT
0
Entering edit mode

I wasn't really telling you off specifically :) I was just illustrating how a bioinformatician with some coding expertise would look at the problem.

ADD REPLY
5
Entering edit mode
13.3 years ago
Neilfws 49k

The script to which you link assumes that you are using a Linux-like operating system. So it has the lines:

csv_file    = "#{ENV['HOME']}/path_to_csv_file.csv"
fasta_file  = "#{ENV['HOME']}/path_to_fasta_file.fasta"

ENV['HOME'] is a special Ruby variable. If your user name was "bob", for example, then Ruby would interpret the lines above as:

/home/bob/path_to_csv_file.csv
/home/bob/path_to_fasta_file.fasta

But the person who wrote that script is using a shorthand. They do not mean that the files should start with "path_to_". They mean that you must specify the full path to the input CSV file and the output FASTA file. So if you want those files, for example, to be in /home/bob/projects/conversion, then you would write:

csv_file    = "#{ENV['HOME']}/projects/conversion/input.csv"
fasta_file  = "#{ENV['HOME']}/projects/conversion/output.fasta"

To make the script work, you would use a plain text editor to write the code, save it with a sensible name such as csv2fasta.rb and then run (in the same directory where you saved the Ruby script):

ruby csv2fasta.rb

And the file output.fasta should appear in /home/bob/projects/conversion.

Unfortunately if you are using Windows, none of the above applies because file paths are completely different. So it is probably best to try:

csv_file    = "input.csv"
fasta_file  = "output.fasta"

Then save the Ruby script in the same directory as input.csv and make sure that you run the script from that same directory.

But remember that Ruby is only one possible solution and that code you find on the web is not always the best code.

ADD COMMENT
0
Entering edit mode

Thanks Neil for explaining the code! :)

ADD REPLY
4
Entering edit mode
13.3 years ago
Rob Syme ▴ 540

Neil's comments are absolutely true and worth keeping in mind for the future.

However, if your csv file is just two columns [name,sequence] you probably doesn't need a whole script. If you're input file looks like this:

chrom_1,CATCGTAGCTAGTCGACTATGCTAGCTAGC
chrom_2,CTGATGCTAGCTACTGACTGACTGATCGATCTAGCTA
chrom_3,ATGCTGACTGATCGTACTGATCGTGACTGCTGAC

Then all you need to do (replace seqs.csv with your csv filename) is start a terminal session (Windows -> run and type "cmd"). Change into the directory that contains your sequences (use the "cd" command) and run:

ruby -ne 'puts ">" + $_.split(",").first(2).join("\n")' seqs.csv

This will give the output:

>chrom_1
CATCGTAGCTAGTCGACTATGCTAGCTAGC
>chrom_2
CTGATGCTAGCTACTGACTGACTGATCGATCTAGCTA
>chrom_3
ATGCTGACTGATCGTACTGATCGTGACTGCTGAC
ADD COMMENT
1
Entering edit mode

You should be opening a terminal, navigating to wherever your CSV file is and running Rob's command from there. Also, if you run the command "ruby -v" in a terminal, it should display the version. Tell us what messages/errors you see.

ADD REPLY
0
Entering edit mode

Thank you both for posting..

I am operating on windows I'm afraid!

Rob - yep my input files are literally like that..I just have 4000 odd chromosomes

I have tried inputting into Ruby exactly what you suggested, but replaced the file name with that of my own...it doesnt work - does it matter where the inputCSV file is saved?

Any other tips much appreciated..sorry if these are ridiculous questions!

xx

ADD REPLY
0
Entering edit mode

Hi Rob, I will add this solution to the original blog post.Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2144 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6