Hello,
It is probably a very basic question, yet i struggle to find an answer to it. My lab ordered a whole genome sequence in a commercial firm, not long ago we received from them a few .fasta files with many short sequences in them. As i understand, i now need to map these short sequences to a reference genome to obtain one long sequence. Any ideas how i can do it? I tried a few software programs but it seems they all need .abi files not .fasta as an input.
Mapping is just a one of a multi-step process of RNA-seq analysis. You also have to check the quality of your reads, trim the adaptors, etc. It is far from trivial.
Having said that - because you are looking for a mapping tool - check the software kallisto.
Thank you for the reply! Its not RNA-seq, we are sequencing viral DNA.
Are you sure they are FASTA files and not FASTQ files?
Generally speaking you should be given the FASTQs to run assembly from (you are doing assembly, not "mapping" at this stage).
I would suggest using a tool like
shovil
to let it determine some sensible parameters for you (it will accept FASTQs).No, i have FASTA files. As i understood, these files have contigs in them. And one file has scaffolds. I guess that means that they did assembly of the reads for us. Also they sent us a table with info about assembly(below). Now i need to assemble these sequences into one, but just can't figure out how.
If they have already assembled these in to contigs and/or scaffolds that's the best you can do with the data you have.
Assuming this isn't what they already did with the scaffold files, you can merely order your contigs and join them with Ns via alignment to a reference genome. I don't know of the best tool for this these days though since you can do basically everything from contigs now, so will have to let others weigh in on that one.
You will not get a closed, complete, genome from this data without doing hybrid assembly with a long read technology (and a bit of luck).
I would also check that assembly file and sort it according to the N50, I don't know what your organism is but those N50s look awful to me, though that might be my inexperience with viruses showing through.