How to get the list of all genes present in sam file?
1
0
Entering edit mode
6.9 years ago
mail2steff ▴ 70

I have two samfiles which were generated by Bowtie (WGS). I have set of gene sequences separately in fasta format. I need to check whether these set of genes are present in sam files or not. How can I achieve this?

next-gen sequencing WGS samtools Bowtie • 2.9k views
ADD COMMENT
0
Entering edit mode

This would be easier if you have a gtf, gff or bed file of your genes of interest. Do you?

If not, I believe the easiest is aligning your fasta to your reference genome, convert that to bed and use that for counting your Sam file (and checking presence).

ADD REPLY
0
Entering edit mode

Thank you for the reply. I have fasta sequence of 25 genes. Not gtf or gff or bed file. While aligning, should I merge 25 fasta sequences to one file and do the aignment against reference genome?

ADD REPLY
0
Entering edit mode

That would be good yes.

ADD REPLY
0
Entering edit mode

If genes are merged then that would be considered as one reference, but I think you're looking for each individual gene in your data. So, I'd rather keep each gene in unique FASTA file and align. Maybe I misunderstood something here.
Please feel free to correct me.

ADD REPLY
0
Entering edit mode

As I understood it, OP asks if the fasta records should be put together in one file (multifasta) or kept in separate files. I don't think OP wants to merge fasta records.

ADD REPLY
1
Entering edit mode
6.9 years ago

Have you considered using Salmon or Kallisto to quantify your reads against your genes?

ADD COMMENT
0
Entering edit mode

That's a good solution.

ADD REPLY
0
Entering edit mode

will Kallisto work for WGS also?

ADD REPLY
0
Entering edit mode

Sure, if you just want to know if those genes are covered

ADD REPLY
0
Entering edit mode

It wasn't clear in your original post that these were DNA sequence results. In that case, I'd suggest mapping your FASTA sequences against your genome and then looking at the coverage in sequencing across the regions to which your FASTA sequences map. Kallisto may "work", but I think we are pretty far off the beaten path.

ADD REPLY
0
Entering edit mode

Thank you for the commands. Ill check on this

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6