Hi all, I’m new to bioinformatics and coding. Therefore, any help to do the following tasks is appreciated. I have a bam file I got from running samtools for 8 samples. I viewed them in IGV and Geneious. I want to use these alignments for a phylogenetic analysis but for that I need to extract only the scaffolds (in IGV) or contigs (in Geneious) that contain all 8 samples that aligned to a specific locus. I identified few of them using IGV and Geneious. But there’s over 2 million scaffolds that I have to go through. Please tell me how I can scan through all of them to extract only the scaffolds that have data for all 8 samples or at least >4 samples. Then I want to know how to export that data in fasta format so that I can use them to build my tree.
Thanks!!!
Hello,
you could first convert the read positions within the bam files to
bed
using bamToBed. Atferwards do a multiIntersect with these bed files and extract the lines with the number of overlaps you like.fin swimmer
Hi finswimmer, Thanks for your suggestion. I just read about BEDtool ans seems like that has an option to do my task. I'll try and see. Thanks!