Question

Where To Get The Adaptor Sequence And Mapping Reads With Bowtie

1

Entering edit mode

13.7 years ago

Sunflow ▴ 30

Hello all, I downloaded the .sra file(SRR052645.sra)(http://www.ncbi.nlm.nih.gov/sra?term=SRR052645) from NCBI SRA, and I want to mapping these reads to the Arabidopsis genome by bowtie, but I'm not sure whether the procedures of data processing is right... the following statement is what I do:
1. Download the .sra file and convert it to the fastq file by the command line: ./fastq-dump -SL -SF SRR052645.sra

2.Mapping reads to genome by the command line:
../bowtie -q -m 1 -n 0 ../genomeindex/athaliana SRR052645.fastq > nucleosomeuniquebowtie.out

The Question is:
(1)My dataset is generated by Illumina sequencing, I don't know whether -SL and -SF parameter I need in my command line, because someone told me -SL and -SF is only needed in pair-end sequencing, but I can not tell whether this is a pair-end sequencing...
(2) I don't know whether I have to trim the adaptor sequences in .sra file , because I didn't see the adaptor sequences provided in NCBI, so does this mean that I don't have to trim the adaptor?
(3) After mapping reads to genome by bowtie, there are a large proportion of reads fail to aligned, follows are the output message : alt text

reads processed: 3572622
reads with at least one reported alignment:158809(43.63%)
reads that fail to align: 1180787(33.05%)
reads with alignments suppressed due to -m: 833026(23.32%)

I think it is very strange, does anyone has the same experience? (Or just because I don't clip the adaptor?)

Best Regards~

bowtie sra adaptor • 5.0k views

ADD COMMENT • link updated 13.7 years ago by ALchEmiXt ★ 1.9k • written 13.7 years ago by Sunflow ▴ 30

score 1 · Answer 1 · 2011-12-16

I do not use the SRA sequences a lot (aren't they DB formatted? or just plain dumps?Usually you can tell whether an Illumina run was PE or not by inspecting its fastq headers. See the fastq wiki for details on that. Not sure if that will help you here.

Regarding the mapping: yes we have seen similar scores of unmapped reads using bowtie. Inspecting them usually reveals that bowtie is (too) strict in the mapping since it doesn't allow any indels or there si still lots of PhyX. You could quite easily move to bwa that does allow in/dels.

For inspecting quality and for finding overrepresented sequences, adapters and such; have a look at the fastQC suite which can also be found online in the usegalaxy.org framework.