Hi! I’m pretty new to bioinformatics in linux :D
My lab received paired-end reads (151 bp) of WGS of purple corn and we mapped them (using bowtie2) against the maize B73v4 reference genome (against all the 10 chromosomes, mitochondria and chloroplast genomes). They got a SAM file (zm.sam) of 371,9 GB.
We used the following commands:
#Maize B73v4 reference genome: GCF_000005005.2_B73_RefGen_v4_genomic.fna
#To index the reference genome:
bowtie2-build GCF_000005005.2_B73_RefGen_v4_genomic.fna GCF_000005005.2_B73_RefGen_v4_genomic.fna
#To align the reads to the reference genome:
bowtie2 -p 10 -I 0 -X 600 --fr --very-fast-local -x ../ref/GCF_000005005.2_B73_RefGen_v4_genomic.fna -1 ZM1_R1.fastq.gz -2 ZM1_R2.fastq.gz -S zm.sam
My task is to assembly the chloroplast genome of purple maize, so I only want the alignments of the reads that mapped/aligned with the chloroplast reference genome (Name: Pltd, NC_001666.2).
Please, can someone guide me in how to extract only the alignments to the chloroplast reference genome (Name: Pltd, NC_001666.2) from that SAM file?
(I will map the reads against the chloroplast reference genome alone later, but my advisor wants me to do this first, to extract the alignments with the chloroplast reference genome from this big sam file)
I have been told to use grep, but I have looked at other posts and I don’t know if that is the best way to deal with this.
Thank you in advance for your help :)
You could use grep... but use samtools. (After compressing the sam to bam)
Yes! I'm going to do that! thank you :)