How to check if BAM file is result of ILLUMINA?
2
0
Entering edit mode
16 months ago

Given a BAM file that is only computer readable how can I check if it comes from a illumina machine?

I tried doing

grep -q "ILLUMINA" filename.sorted.bam

To check if ILLUMINA word appeared but it did not appear, probably due to the fact that BAM are not human readable therefore word does not appear.

BAM ILLUMINA • 778 views
ADD COMMENT
1
Entering edit mode
16 months ago
GenoMax 147k

You should look at the read names (samtools view bam_file, column 1) to see if they follow Illumina naming convention. Post a couple of lines (samtools view bam_file | head -2) if you want us to look. It may also be useful to look in BAM header to see if the aligner command line is embedded there. File names may give some clue, if the data is Illumina.

If the read lengths (column 10 in SAM/BAM file) are 300 bp or less they are likely to be Illumina (there is no guarantee, other sequencers will produce reads in this range)

Using grep with binary BAM file will not produce any result other than a note that it encountered a binary file.

ADD COMMENT
0
Entering edit mode

Thank you for your input! This is the output of "samtools view bam_file | head -2"

K00163_1678_HNM77BBXY_5_2211_10876_6835_eR1j    117     chr1    9995    0       *       =       9995    0       
TTCCGATCTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT      
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA       RG:Z:HNM77BBXY.lane0.2P_FMI_10
K00163_1678_HNM77BBXY_5_2211_10876_6835_eR1j    153     chr1    9995    37      7M1I41M =       9995    0          
TTCCGATCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC      
JJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJFJJJJJJJJJJFFFAA       X0:i:1  X1:i:0  MD:Z:0G6A40     
RG:Z:HNM77BBXY.lane0.2P_FMI_10XG:i:1   AM:i:0  NM:i:3  SM:i:37 XM:i:2  XN:i:6  XO:i:1  XT:A:U
ADD REPLY
0
Entering edit mode

Those identifiers are Illumina with something appended at end _eR1j of each read ID. This data came from a HiSeq 3000/4000 sequencer.

ADD REPLY
0
Entering edit mode
16 months ago

ILLUMINA should be set (but it's not always...) in the read group PL: in the vcf header.

 samtools view -H  in.bam | grep '^@RG' | grep ILLU
@RG ID:X    LB:X    PL:ILLUMINA PU:PU0  SM:X    CN:Center
ADD COMMENT

Login before adding your answer.

Traffic: 1061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6