How to check size of genome?
1
0
Entering edit mode
3.9 years ago
Jimpix ▴ 10

Hey! It is possible to check the size of genome (number of bp) in bam (or sam or fastq) file?

Thanks in advance.

bam bp • 765 views
ADD COMMENT
0
Entering edit mode

You are not checking the size of the genome if you are simply looking at your raw or aligned data. You are just counting the number of bases sequenced.

If you want to estimate size of the genome then you need to do something like this.

ADD REPLY
0
Entering edit mode

I want to know how to check number of base pairs in bam file

ADD REPLY
2
Entering edit mode

You could simply run reformat.sh -Xmx10g in=your.bam from BBMap suite. It will produce output that will contain this information.

Input is being processed as unpaired
Input:                          9207996 reads           684032432 bases
Output:                         9207996 reads (100.00%)         684032432 bases (100.00%)

mappedonly=t etc should modify the output. Test it out.

ADD REPLY
0
Entering edit mode

I want to know how to check number of base pairs in bam file

That's not the same as the size of the genome though. What exactly do you want?

ADD REPLY
4
Entering edit mode
3.9 years ago

Reads will overlap, so you can't simply count the lengths of reads. To count the number of unique bases over reads per chromosome for assembly hg38, for example, using BEDOPS, bash, awk, and UCSC Kent utilities:

$ ASSEMBLY=hg38
$ bam2bed --reduced < reads.bam | bedmap --echo --bases-uniq <( fetchChromSizes ${ASSEMBLY} | grep -v "*_*" | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - ) - > unique_bases_per_chromosome.bed

To get the total unique bases over the genome, sum up the last column in the result:

$ awk -v FS="\t" '{ s += $4 } END { print s }' unique_bases_per_chromosome.bed > total_unique_bases_over_assembly.txt

For SAM files, use sam2bed in place of bam2bed. For FASTQ, you would use a mapping tool to map raw sequence to locations on the genome, turning that into BAM.

ADD COMMENT

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6