Hello everyone
I am trying to align the query Fastq nucleotide sequences with the reference (.fa) using EMBOSS needleall alignment tool. The query file has 2271611 sequences to align. However, the output sam file generated gives alignment score for only 881 sequences. I would like to generate the alignment score for all the query sequences. The output also generates needleall.error file. However, this error file is empty. I am not able to figure out what could be reason.
Looking for the response
Thanks
/home/user/EMBOSS-6.6.0/emboss/needleall -asequence X3.fa -bsequence X3rL41-post.merged.fastq -gapopen 10 -gapextend 0.5 -datafile 'EDNAFULL' -outfile X3rL41_post_new.sam -aformat sam
I'm not sure if you're using the appropriate tool here ...
needleall is to do pairwise comparison of many sequences to many other sequences (usually the same as input set). Moreover, I think it is meant for fasta formatted files (not fastq).
when aligning short reads to a reference you're better of using of of the NGS-aligners : HiSat, Salmon?, STAR, bwa , ...
It does take fastq file as an input. The advantage of this tool is that it allows alignment of the fastq sequences with the fasta sequences.