Hey everyone,
I am new to the world of bioinformatics. I am currently working with my professor and have been asked to convert some FASTQ files to FASTA. There are several files and each of them are split into 2 (R1 and R2), correct me if I am wrong, but I believe these are called Illumina sequences? Do I have to merge these files or can I convert each of them to FASTA files separately?
I would like to know what your suggestions are on a good way to convert these files to FASTA? I am using a mac. I am still learning so my skills are not where I would like them to be (yet). Any pointers would be appreciated.
Thank you
It is one thing to convert fastq to fasta (a format conversion) what is your ultimate aim? You have referred to "merging" them? What is a concept one step above fastq/fasta conversion. It will only work if your reads (R1/R2 overlap).
This is a broad overview of what a sequenced fragment looks: C: How to quantify the overlapping reads in paired-end DNA sequencing to check the Only if the length of
R1 or R2
is longer than insert size then the reads will merge.I just need to do a format conversion of the files so that my professor can open them. I just need a good method of converting them to FASTA. There seem to be many different ways of doing this, what is a good method for someone who is a novice? Can I use python?
Thanks
You can use following program from BBMap suite.
This will convert your fastq files into fasta format.
Another question, sorry. How does this work in python exactly? Am I supposed to run this in the Mac Terminal while being in python?
This is not python at all. BBMap suite is written in Java. What you are running above is a shell script that runs the actual java command line (which is more complicated to write out).
I am not very familiar with Java. I am a Biology student so my knowledge of programming is lacking. I am learning as I go. Is there a simpler way of doing this in Python? Or any program that would suit me better? Thanks again.
This is about the simplest way you can do this. All you need is java installed.
So all I need to do is install Java and run that command line and it'll output a FASTA file?
Yes for Java installation. You will also need to download the BBMap software from link above and then run the command as noted. No installation is needed for BBMap. Just uncompress the file you download and add the directory with
*.sh
scripts to your$PATH
.Even simpler option would be to just use
sed
program on any linux system. No additional programs needed.Ok, I've installed Java and I checked in the terminal to ensure that I have Java by typing in (java -version). I've also downloaded bbmap and have unzipped it. I have the bbmap folder on my desktop. How do I activate bbmap in the terminal so that I can run the script reformat.sh in=your.fq.gz out=your.fa?
Change to BBMap directory on desktop in a terminal window. Then do (use real directory/file paths)
I'm working in mac, does this also work in mac? I also have a windows machine but I am currently working in mac.
It will work fine on macOS. I suggest you just use the
sed
method I posted above with Mac.