Making FASTA files
2
0
Entering edit mode
9.7 years ago
kevluv93 ▴ 170

Hey gang,

I downloaded a series of DNA sequences that I was going to use as a scaffold for throwing RNA data at. However, I'm having a hard time formatting my fasta file correctly.

I know that I need to begin the first line with >(place identifier here), save the file extension as a .fa, and then I should be good. However the spacing of my data is irregular and every time I try to use my fasta file in IGV or for genome guided RNA assembly in Tophat, both the programs tell me that my formatting is incorrect. It's just that I don't have consistent spacing, my file should have 80 base pairs per line (if this is wrong please tell me), and not all of my data has this consistency.

ex

aaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa
aaaaaaaaa
aaa
aaaaaaaaaaaaaaaa

How do I make the spacing in my fasta file correct so that it will run with these programs? I have notepad++ and no knowledge of programming, so whipping up programs that can space this properly is not really an option for me. Any advice? When you copy and paste small sequences of DNA to be turned into a fasta file, how do you guys do it? Is there an option on notepad++ that lets me space my fasta file correctly?

Thanks for the help!

RNA-Seq FASTA • 6.5k views
ADD COMMENT
0
Entering edit mode

where have you downloaded this data ?

ADD REPLY
0
Entering edit mode

Baylor College of Medicine, arthropod, Copidosoma. They have an incompleted genome of this animal and I copied and pasted one of their gene scaffolds of interest.

ADD REPLY
0
Entering edit mode

When you say irregular format: Do you have spaces after each sequence or spaces in-between same sequence ?

While copy pasting, why don't you just add one line like '>Sequence_name' and then paste the corresponding sequence below that line ?

ADD REPLY
0
Entering edit mode

I should be more specific, I don't know how to attach compressed files to responses on this site. But here's what my sequence looks like:

>Scaffold642
AGATCAGAAACGCGCACGAGCTCGAAAGAAAGTGGGCCCAGCACTGCAGATAGATG       
CACTACGATAGGTAAAGAACATGGTAAGAGGTGGGC
CCCCTCCCCTCGACATTTGGTGGGCCATTGCAGGCTTGTAGAGGCAAGGATAATGCACACGAGCTAGTGTATTATGTACTGTACTTTTAGTCTAGACCATCGTG

TATGCCACTTTGGGTCCCTCTGCGTTACTTATTATTTTAGAGTATTTTTTTTTGCTTAGTGCTCTGGTAG
TATTACCTCTAGGTCTTATGCTGTTGGAAAAAAACGTTTTTTGTTTAATGCATGACTCTGGACGATTAAT
AAAAGGATATCTAGACACAAAACATATGTAAATTTTTAGTGAATGTTTTTTATTTACACTGAGCGAAAAA

They're staggered like that, and every time I try to move the sequences, it'll bring the whole previous line from the bottom to the top. I don't want to have to continue down the file like this, trying to space them correctly

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

You don't need the empty lines at all. Check this out to remove them: http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad

ADD REPLY
0
Entering edit mode
9.5 years ago
alolex ▴ 960

Just paste your uneven sequence blocks in here: http://genome.nci.nih.gov/tools/reformat.html

Select raw input sequence with the old fasta output and it will give you this:

AGATCAGAAACGCGCACGAGCTCGAAAGAAAGTGGGCCCAGCACTGCAGATAGATGCACT
ACGATAGGTAAAGAACATGGTAAGAGGTGGGCCCCCTCCCCTCGACATTTGGTGGGCCAT
TGCAGGCTTGTAGAGGCAAGGATAATGCACACGAGCTAGTGTATTATGTACTGTACTTTT
AGTCTAGACCATCGTGTATGCCACTTTGGGTCCCTCTGCGTTACTTATTATTTTAGAGTA
TTTTTTTTTGCTTAGTGCTCTGGTAGTATTACCTCTAGGTCTTATGCTGTTGGAAAAAAA
CGTTTTTTGTTTAATGCATGACTCTGGACGATTAATAAAAGGATATCTAGACACAAAACA
TATGTAAATTTTTAGTGAATGTTTTTTATTTACACTGAGCGAAAAA
ADD COMMENT
0
Entering edit mode
9.5 years ago
kevluv93 ▴ 170

Just as a follow-up, since I forgot I posted this a while back. I solved this issue by plugging all of my FASTA files into Galaxy and using the "Normalize FASTA" option under "Picard". If you install Picard onto your computer, you have to upload Java 1.6 for it to work. This is a significant downgrade and my IT guy wouldn't even let me install it into my workplace's computer.

However, options to manipulate FASTA files were already in Galaxy and Picard comes installed as a plug-in to the site. I just used those. Thanks for the input anyway guys.

Upload your file, use the tools under "Manipulate FASTA or "Picard".

ADD COMMENT
0
Entering edit mode

Install? As in Windows? You could always create an AWS instance with your preferred flavor of run your analysis. That'd cost you a fraction of investing in new resources. If this task is regular, you could always get a new machine (or repurpose an old one, if your budget is constrained) and run Linux on it.

ADD REPLY
1
Entering edit mode

I'll actually keep that in mind. I'll probably get a machine repurposed to run Linux so I can perform these tasks easier in the future. Thanks for the input.

ADD REPLY
0
Entering edit mode

Linux is more flexible because you can just point to the appropriate version of software as you use it. A temporary aliasing of javac and java to a task-specific version can circumvent the installation problem.

ADD REPLY

Login before adding your answer.

Traffic: 2796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6