".sam" file - should it contain multiple samples?
1
0
Entering edit mode
19 months ago
Roland ▴ 20

Hi. I feel like this might be a stupid question, but I can't seem to find an answer anywhere. I'm currently trying to align my paired-end reads to a reference genome. To do this, I believe I need to create a ".sam" file using "bwa".

However, what I don't understand is, should I create one .sam file that contains all of my paired-end samples? That is, I have around 100 samples that are paried-end (so, in total, 200 files). Should I align all of these samples to the reference genome "in one go" so that I have one large .sam file containg all my 100 samples? Or should I create 100x .sam files where I've aligned each sample separately?

My goal later is to find SNPs of these samples.

Thanks!

SAM • 949 views
ADD COMMENT
1
Entering edit mode

You can use read groups to include multiple samples in one alignment file (see: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups ). If you are planning to use GATK for SNP calling then this may indeed be a requirement.

ADD REPLY
0
Entering edit mode
19 months ago
ATpoint 85k

If each "sample" is an independent specimen then one usually produces one alignment per sample. Don't use sam files, use bam right away. SAM is human-readable, by this uncompressed, therefore large and takes up space. DOwnstream things like variant calling run on the binary bam format.

bwa mem (options...) | samtools view -o out.bam
ADD COMMENT
0
Entering edit mode

Thank you. The samples are independent, but they belong to a four different populations (sampling sites).

ADD REPLY

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6