Bwa sampe - BGI
1
0
Entering edit mode
8 months ago
lorena9132 ▴ 10

Hi, I'm new to bioinformatics and I'm analyzing some genomes I obtained from BGI. The average size of these reads is 100bp, but after trimming they are on average 80bp. The coverage is low, so I decided to perform the alignment with BWA aln - BWA sampe, but the size of the files is almost 300GB. Does anyone have suggestions for adjusting the parameters to prevent such large file sizes?

I'm using BWA sampe with the command:

bwa sampe GCA_000001405.15_GRCh38_no_alt_analysis_set.fna X_1.sai X_2.sai X1.fq.gz X_2.fq.gz > X_aln.sam
Bwa-sampe BGI • 587 views
ADD COMMENT
1
Entering edit mode
8 months ago

bwa sampe produces unsorted SAM , not BAM.

converting SAM to BAM will reduce the size

sorting the BAM will reduce the size (you can pipe the output of sampe to samtools sort)

at the end, converting to CRAM will reduce the size.

ADD COMMENT
0
Entering edit mode

Hello, thank you very much for your response. I was referring more to using these options:

-a INT Maximum insert size for a read pair to be considered being mapped properly. Since 0.4.5, this option is only used when there are not enough good alignments to infer the distribution of insert sizes. [500]
-o INT Maximum occurrences of a read for pairing. A read with more occurrences will be treated as a single-end read. Reducing this parameter helps faster pairing. [100000]
-P Load the entire FM-index into memory to reduce disk operations (base-space reads only). With this option, at least 1.25N bytes of memory are required, where N is the length of the genome.
-n INT Maximum number of alignments to output in the XA tag for reads paired properly. If a read has more than INT hits, the XA tag will not be written. [3]
-N INT Maximum number of alignments to output in the XA tag for discordant read pairs (excluding singletons). If a read has more than INT hits, the XA tag will not be written. [10]
-r STR Specify the read group in a format like ‘@RG\tID:foo\tSM:bar’. [null]"
ADD REPLY
2
Entering edit mode

Honestly, use bwa mem like everyone else and call it a day.

bwa mem ref.fa X1.fq.gz X_2.fq.gz | samtools sort -o sorted.bam

See the bwa mem manual on how to build a reference.

ADD REPLY

Login before adding your answer.

Traffic: 2118 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6