I downloaded a series of DNA sequences that I was going to use as a scaffold for throwing RNA data at. However, I'm having a hard time formatting my fasta file correctly.
I know that I need to begin the first line with >(place identifier here), save the file extension as a .fa, and then I should be good. However the spacing of my data is irregular and every time I try to use my fasta file in IGV or for genome guided RNA assembly in Tophat, both the programs tell me that my formatting is incorrect. It's just that I don't have consistent spacing, my file should have 80 base pairs per line (if this is wrong please tell me), and not all of my data has this consistency.
How do I make the spacing in my fasta file correct so that it will run with these programs? I have notepad++ and no knowledge of programming, so whipping up programs that can space this properly is not really an option for me. Any advice? When you copy and paste small sequences of DNA to be turned into a fasta file, how do you guys do it? Is there an option on notepad++ that lets me space my fasta file correctly?
Baylor College of Medicine, arthropod, Copidosoma. They have an incompleted genome of this animal and I copied and pasted one of their gene scaffolds of interest.
ADD REPLY
• link
updated 2.5 years ago by
Ram
44k
•
written 9.7 years ago by
kevluv93
▴
170
0
Entering edit mode
When you say irregular format: Do you have spaces after each sequence or spaces in-between same sequence ?
While copy pasting, why don't you just add one line like '>Sequence_name' and then paste the corresponding sequence below that line ?
They're staggered like that, and every time I try to move the sequences, it'll bring the whole previous line from the bottom to the top. I don't want to have to continue down the file like this, trying to space them correctly
ADD REPLY
• link
updated 2.5 years ago by
Ram
44k
•
written 9.7 years ago by
kevluv93
▴
170
Just as a follow-up, since I forgot I posted this a while back. I solved this issue by plugging all of my FASTA files into Galaxy and using the "Normalize FASTA" option under "Picard". If you install Picard onto your computer, you have to upload Java 1.6 for it to work. This is a significant downgrade and my IT guy wouldn't even let me install it into my workplace's computer.
However, options to manipulate FASTA files were already in Galaxy and Picard comes installed as a plug-in to the site. I just used those. Thanks for the input anyway guys.
Upload your file, use the tools under "Manipulate FASTA or "Picard".
Install? As in Windows? You could always create an AWS instance with your preferred flavor of run your analysis. That'd cost you a fraction of investing in new resources. If this task is regular, you could always get a new machine (or repurpose an old one, if your budget is constrained) and run Linux on it.
I'll actually keep that in mind. I'll probably get a machine repurposed to run Linux so I can perform these tasks easier in the future. Thanks for the input.
Linux is more flexible because you can just point to the appropriate version of software as you use it. A temporary aliasing of javac and java to a task-specific version can circumvent the installation problem.
where have you downloaded this data ?
Baylor College of Medicine, arthropod, Copidosoma. They have an incompleted genome of this animal and I copied and pasted one of their gene scaffolds of interest.
When you say irregular format: Do you have spaces after each sequence or spaces in-between same sequence ?
While copy pasting, why don't you just add one line like '
>Sequence_name
' and then paste the corresponding sequence below that line ?I should be more specific, I don't know how to attach compressed files to responses on this site. But here's what my sequence looks like:
They're staggered like that, and every time I try to move the sequences, it'll bring the whole previous line from the bottom to the top. I don't want to have to continue down the file like this, trying to space them correctly
Normalize Fasta from Picard Tools might be of use. https://broadinstitute.github.io/picard/command-line-overview.html#NormalizeFasta
You don't need the empty lines at all. Check this out to remove them: http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad