Hi all,
I have a series of SAM files, which I know that I have to convert to BAM and sort for using FLAGSTAT and VIEW in samtools. Having ref aligned data and previously removing reads that don't map I now want to produce a single file with the remaining number of mapped read outputs into a single file. The below command outputs the result from one single sample, but can anyone tell me how to run multiple (500) files into a single list of outputs as I cannot find a way to do this at all, whatever I run it only puts in the file for 1 single sample - many thanks for your kind help.
samtools view sample1.bam | cut -f1 | sort | uniq | wc -l > output
Was a little confused, are you basically wanting to count the number of reads that are mapped in a BAM file? And then do that for 500 BAM files? A bash loop would do it.
In your command, try >> (append) instead of >, as > will delete the file first, and then add data.
Sorry Tonor and thank you for the reply - Not a cumulative total, I want an output of how many reads aligned for each sample independently but without having to run 'view' in samtools singularly for each sample.
In truth it would be easier of there was a way to do it from sam files as that is what my samples are in at the moment but it seems converting to bam may be the only option.
Best
Jamie
jt358 : Please use
ADD COMMENT/ADD REPLY
when responding to existing posts. This helps keep threads logically organized.Apologies genomax2 - my misunderstanding
For a sam file, you could do this much more efficiently with the BBMap package:
I don't think your current command will produce correct output if you are using paired-end data, as only a single line would be retained for each pair of reads.
I don't really write loops in bash, which is one possible solution. Instead, what I typically do is to use Excel:
Paste that into Excel, write a command like:
...drag it down, and paste that into a new shell script.
Or see Tonor's solution for writing a loop in bash :)
But if you only need the number of mapped reads, you don't need the "out=" flag. Reformat will print the total numbe of reads that passed the filters (primary and mapped) at the end.
Hi Brian - thanks for the answer.
For project specific reasons, I've concatenated pairs into a single file before aligning and then look at linkage later down the line.
My input sam files from which I want the read counts also only contained reads that I know map as opposed to total reads mapped or unmapped
Cheers
That's fine, but bear in mind that the sam format mandates that read 1 and read 2 have the exact same name if they are mapped as pairs, and if they are not mapped as pairs, many programs will still truncate the names at the first whitespace, meaning that normal Illumina names (such as "hiseq:1:2:3:4 1:ACGT") will be truncated such that read 1 and 2 will have identical names, so after sort/uniq you will be missing half of your reads.
Thanks everyone on the thread for all your kind help - really grateful!!!!