[Samtools] How to extract reads matching a given sequence
3
0
Entering edit mode
10.5 years ago
madkitty ▴ 690

I have a BAM file that does not contain any chromosome or position information, our file contains only scaffold of a new genome from specie B. I'm looking to find scaffolds/reads matching a gene (35kb long) from specie A in the BAM file of specie B. I expect some differences (at the SNP level). Since our bam file doesn't contain chromosome and position information, is there a way to tell samtools, this my sequence, find any reads/scaffold that can match (with at least 60% of similarity) that sequence?

PS: Thanks for answering thou I do not have access to BWA so far.. otherwise I would have done it with BWA long time ago.. I work remotely on a server with limited capability T.T

OUR BAM FILE THAT CONTAINS ONLY SCAFFOLD INFO (NO CHROMOSOME OR POSITION)

FCC00CUABXX:7:2105:3995:105611#NNNNNCGT 73      scaffold12      24440   37      100M    =       24440   0       AACCATTTCGTTTCTGAATGAAGAATACAATACTGGAAATGAAAACTTCACTACAGGAACTCAATAGCAAAGTAGATGATACAGAAGAACAGATCAGCAA  HHHHHHHHHHHHHHHHHHHHHHHHFHHHHHHHHHHHFHHHHHHHHHHHHHHHHHHHFHFHHHHHHHHHDFBEBFFHFHHHHH9HDFFFEHHHHHDDFE<E    X0:i:1  X1:i:0  MD:Z:100        RG:Z:DONdtxDABDMAAPE  XG:i:0  AM:i:0  NM:i:0  SM:i:37 XM:i:0  XO:i:0  XT:A:U
FCB01AKABXX:1:1202:4275:93484#TCNGNGCN  97      scaffold12      24443   37      49M     =       21972   -2422   CATTTCGTTTCTGAATGAAGAATACAATACTGGAAATGAAAACTTCACT       HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH     X0:i:1  X1:i:0  MD:Z:49 RG:Z:DONdtxDAADWAAPE.1  XG:i:0  AM:i:37 NM:i:0  SM:i:37 XM:i:0  XO:i:0  XT:A:U
alignment samtools sequence sequencing • 6.1k views
ADD COMMENT
3
Entering edit mode
10.5 years ago

convert the reads back to FASTQ and map them with bwa to your gene ?

ADD COMMENT
0
Entering edit mode

I guess I really have no other option and started to install BWA on the server. Thanks XD.

ADD REPLY
1
Entering edit mode
10.5 years ago
Chris Fields ★ 2.2k

samtools does not align sequences (as Pierre alludes to); as the name implies, it is used to analyze and process SAM/BAM data, which is assumed to be aligned already. You need to either extract the FASTQ or (if the aligner accepts BAM as input) use it directly and align to sequence B.

EDIT: by 'as Pierre alludes to', I meant his indication that bwa (an aligner) would be required.

ADD COMMENT
1
Entering edit mode
Hi Chris, thanks for your answer. Just to clarify, Pierre didn't allude that SAMtools can align sequences, he suggested, just like you did, to convert the reads back to fastq map them with BWA to my gene of interest.
ADD REPLY
1
Entering edit mode

Agreed, see my clarification.

ADD REPLY
0
Entering edit mode
7.8 years ago
nyirock • 0

If you could extract your sequences to a fasta file, I have a tool that would do exactly what you are looking for

MG Wrapser

It uses BLAST, a reference(s) and query sequences, your fasta of extracted sequences. You may specify both identity and alignment length. In the output you'll have the extracted genes of interest in a fasta file

ADD COMMENT

Login before adding your answer.

Traffic: 1960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6