Hi,
I've just downloaded several mouse CAGE-seq data from FANTOM5 database.
I tried using bowtie2 default setting to map those rdna.fa files to mm10. And I found a quite low mapping rate around 40% with the majority reads hit more than 1 locus.
I'm totally new to CAGE-seq data. Please forgive me if I ask something silly.
- When I looked into the files, the reads seemed being trimmed already. Is this mapping rate normal?
- Due to the short length of each read, it's reasonable to hit multiple genomic locations. But won't that raise false positive result when measuring which transcripts are 'really' expressed?
- Is there any specific parameter I should apply in Bowtie2?
Hi! Could also be a stupid question .. but where did you manage to download the .bam files for the FANTOM5 data, I can only find bed files so far (also in teh CAGEr package). Thanks in advance.
I guess here: http://fantom.gsc.riken.jp/5/datafiles/latest/basic/ ?