I have two samfiles which were generated by Bowtie (WGS). I have set of gene sequences separately in fasta format. I need to check whether these set of genes are present in sam files or not. How can I achieve this?
I have two samfiles which were generated by Bowtie (WGS). I have set of gene sequences separately in fasta format. I need to check whether these set of genes are present in sam files or not. How can I achieve this?
Have you considered using Salmon or Kallisto to quantify your reads against your genes?
It wasn't clear in your original post that these were DNA sequence results. In that case, I'd suggest mapping your FASTA sequences against your genome and then looking at the coverage in sequencing across the regions to which your FASTA sequences map. Kallisto may "work", but I think we are pretty far off the beaten path.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This would be easier if you have a gtf, gff or bed file of your genes of interest. Do you?
If not, I believe the easiest is aligning your fasta to your reference genome, convert that to bed and use that for counting your Sam file (and checking presence).
Thank you for the reply. I have fasta sequence of 25 genes. Not gtf or gff or bed file. While aligning, should I merge 25 fasta sequences to one file and do the aignment against reference genome?
That would be good yes.
If genes are merged then that would be considered as one reference, but I think you're looking for each individual gene in your data. So, I'd rather keep each gene in unique FASTA file and align. Maybe I misunderstood something here.
Please feel free to correct me.
As I understood it, OP asks if the fasta records should be put together in one file (multifasta) or kept in separate files. I don't think OP wants to merge fasta records.