What is ref sequence name in BAM file header ?
1
1
Entering edit mode
10.6 years ago
newDNASeqer ▴ 790

I am trying to use cn.mops to make CNV calls on my BWA-aligned BAM files, the getReadCountsFromBAM() function has an option for seqRefName, and the explanation for this field is:

Name of the reference sequence that should be analyzed. The name must appear in the header of the BAM file. If it is not given the function will select the first reference sequence that appears in the header of the BAM files.

I used samtools view to view the header of my bam files, and here is what I see:

@HD     VN:1.4  SO:coordinate
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr11_gl000202_random        LN:40103
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr17_ctg5_hap1      LN:1680828
@SQ     SN:chr17_gl000203_random        LN:37498
@SQ     SN:chr17_gl000204_random        LN:81310
@SQ     SN:chr17_gl000205_random        LN:174588
@SQ     SN:chr17_gl000206_random        LN:41001
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr18_gl000207_random        LN:4262
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr19_gl000208_random        LN:92689
@SQ     SN:chr19_gl000209_random        LN:159169
@SQ     SN:chr1_gl000191_random LN:106433
@SQ     SN:chr1_gl000192_random LN:547496
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr21_gl000210_random        LN:27682
@SQ     SN:chr22        LN:51304566
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr4_gl000193_random LN:189789
@SQ     SN:chr4_gl000194_random LN:191469
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371

For the BWA alignment, I used hg19 as my reference, however, I do not see hg19 in the BAM headers, so what does it mean by seqRefName in cn.mop

One thing I've noticed is that getReadCountsFromBAM() seems using "chr1" as the default seqRefName when I don't specify it. So should I use a vector of chr(1..22) for seqRefName?

cn.mops cnv • 7.0k views
ADD COMMENT
1
Entering edit mode

Yes, you need to use vector of all chr names if you would to use all chromosomes in the same time. Example is:

refSeqName = c('@unitig_1066|quiver', '@unitig_1422|quiver', '@unitig_1441|quiver', '@unitig_223|quiver')
ADD REPLY
0
Entering edit mode

Thanks. This was very helpful. I finally used

fasta=readDNAStringSet("assembly.fasta")
refSeqName=names(fasta)
ADD REPLY
0
Entering edit mode

I have the same problem, but I donĀ“t understand your solution.

I just tried to copy and past, but nothing happen...

Can someone explain me, what I have to do? pls!

ADD REPLY
0
Entering edit mode

Though I have never used cn.mops but based on whatever you posted it seems that by "reference sequence" cn.mops means "chromosomes" or "contigs" that were part of the reference fasta file against which the reads were aligned.

ADD REPLY
1
Entering edit mode
10.6 years ago

As you seem to have guessed, it's referring to the chromosome/contig names (i.e., what's in column 2 after SN:).

ADD COMMENT

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6