Question

What is ref sequence name in BAM file header ?

1

Entering edit mode

11.2 years ago

newDNASeqer ▴ 790

I am trying to use cn.mops to make CNV calls on my BWA-aligned BAM files, the getReadCountsFromBAM() function has an option for seqRefName, and the explanation for this field is:

Name of the reference sequence that should be analyzed. The name must appear in the header of the BAM file. If it is not given the function will select the first reference sequence that appears in the header of the BAM files.

I used samtools view to view the header of my bam files, and here is what I see:

@HD     VN:1.4  SO:coordinate
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr11_gl000202_random        LN:40103
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr17_ctg5_hap1      LN:1680828
@SQ     SN:chr17_gl000203_random        LN:37498
@SQ     SN:chr17_gl000204_random        LN:81310
@SQ     SN:chr17_gl000205_random        LN:174588
@SQ     SN:chr17_gl000206_random        LN:41001
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr18_gl000207_random        LN:4262
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr19_gl000208_random        LN:92689
@SQ     SN:chr19_gl000209_random        LN:159169
@SQ     SN:chr1_gl000191_random LN:106433
@SQ     SN:chr1_gl000192_random LN:547496
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr21_gl000210_random        LN:27682
@SQ     SN:chr22        LN:51304566
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr4_gl000193_random LN:189789
@SQ     SN:chr4_gl000194_random LN:191469
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371

For the BWA alignment, I used hg19 as my reference, however, I do not see hg19 in the BAM headers, so what does it mean by seqRefName in cn.mop

One thing I've noticed is that getReadCountsFromBAM() seems using "chr1" as the default seqRefName when I don't specify it. So should I use a vector of chr(1..22) for seqRefName?

cn.mops cnv • 7.7k views

ADD COMMENT • link updated 3.9 years ago by Ram 45k • written 11.2 years ago by newDNASeqer ▴ 790

1

Entering edit mode

Yes, you need to use vector of all chr names if you would to use all chromosomes in the same time. Example is:

refSeqName = c('@unitig_1066|quiver', '@unitig_1422|quiver', '@unitig_1441|quiver', '@unitig_223|quiver')

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 10.4 years ago by Andrey Yurchenko ▴ 10

0

Entering edit mode

Thanks. This was very helpful. I finally used

fasta=readDNAStringSet("assembly.fasta")
refSeqName=names(fasta)

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 10.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

I have the same problem, but I don´t understand your solution.

I just tried to copy and past, but nothing happen...

Can someone explain me, what I have to do? pls!

ADD REPLY • link 9.0 years ago by karim ▴ 20

0

Entering edit mode

Though I have never used cn.mops but based on whatever you posted it seems that by "reference sequence" cn.mops means "chromosomes" or "contigs" that were part of the reference fasta file against which the reads were aligned.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by Ashutosh Pandey 12k

score 1 · Answer 1 · 2014-05-20

1

Entering edit mode

11.2 years ago

Devon Ryan 105k

As you seem to have guessed, it's referring to the chromosome/contig names (i.e., what's in column 2 after SN:).

ADD COMMENT • link 11.2 years ago by Devon Ryan 105k