Question

Mapping multiple reads to a single genome

0

Entering edit mode

7.4 years ago

Paul ▴ 80

I have a number of pair-end and single-end reads to map it to a reference genome.

TaxonID SRR files
1448592 SRR1172918 SRR1175065 SRR1184297 SRR1196515
1448462 SRR1180190 SRR1181352 SRR1181404 SRR1183042
1402586 SRR1011524 SRR1019194
1448524 SRR1172749 SRR1173120 SRR1184340 SRR1196497
1295800 SRR833218 SRR1011520

Now each SRR number is a folder in itself consists of pair end and single end reads. Now my aim is to read each SRR folder for a particular TaxonID and map it to a single reference genome.

Please suggest me a way to do this.

I have the following script, but I think for this the files has to be in a single folder

FILES=`ls SRR*_P1.fastq | sed 's/_P1.fastq//g'`
for F in $FILES ; do
        R1=${F}_P1.fastq
        R2=${F}_P2.fastq
        bowtie --all -S Trinity -1 $R1 -2 $R2 > ${F}.sam  
        samtools view -S -b ${F}.sam > ${F}.bam
done

Please suggest a way to map the multiple reads to a reference genome... If any R package or a shell script

bash • 1.9k views

ADD COMMENT • link updated 7.4 years ago by PoGibas 5.1k • written 7.4 years ago by Paul ▴ 80

0

Entering edit mode

Do you want to generate a separate BAM file for every SRR subfolder?

ADD REPLY • link 7.4 years ago by venu 7.1k

0

Entering edit mode

oh! can I generate a single bam file for all the subfolders? will that be fine? In that case, I can put all the SRR files in a single folder and map it against a single reference sequence. But I have thousands of SRR files

ADD REPLY • link 7.4 years ago by Paul ▴ 80

0

Entering edit mode

oh! can I generate a single bam file for all the subfolders?

It depends on the data you have. If each SRR represents a different experiment/condition, better generate a separate bam for every SRR. You can merge replicates (if you have) after generating the BAM also.

ADD REPLY • link 7.4 years ago by venu 7.1k