Hello,
Is STAR aligner recommended for use with ChIP-seq data? I am trying to use STAR for ChIP-seq data to obtain reads mapped to multiple regions of the genome with mismatch options, which STAR seems to do better than Bowtie2. I get only around 14% of reads mapped, and around 80% in "% of reads unmapped: too short". From the suggestions in the link -
https://groups.google.com/forum/#!topic/rna-star/E_mKqm9jDm0, I tried --alignIntronMax 1 option but the results are similar. Please advise, thank you.
What is the size distribution of reads in that pool (or this data in general)? If the reads are very short (< 30-40 bp, after scan/trim) then it may indeed be difficult to map them.
Then @predeus' answer may not apply. You likely have a different problem. Have you checked a sampling of reads that do not map by blast? You could have some sort of contamination in your data.
I concur with genomax. Did you run FastQC on the fastq files? It's likely that only about 18% of your reads are usable if both STAR and bowtie2 agree. Depending on what FastQC says, you may be able to rescue some more reads by adapter trimming.
"too short" is STAR's euphemism for reads that just fail to align. What's the alignment rate you're getting with bowtie2? Chip-Seq is very tricky experimentally, so it happens quite often that libraries are full of adapter sequences etc. Aligners (as long as you are using a well-supported modern one, like bwa, bowtie2, or STAR) should not matter all that much.
Some types (e.g. H3K9me3) are also enriched for multimapping reads because these marks are enriched in heterochromatin.
Bowtie2 also gave me only 18% alignment but I was confused because the file sizes are not comparable. The bam file from Bowtie2 (1,035,494,925) is much larger than the one from STAR (275,497,682).
P.S. It's fly genome, hence the smaller sizes.
What is the size distribution of reads in that pool (or this data in general)? If the reads are very short (< 30-40 bp, after scan/trim) then it may indeed be difficult to map them.
@genomax, The average read size is 50-75bp
Then @predeus' answer may not apply. You likely have a different problem. Have you checked a sampling of reads that do not map by blast? You could have some sort of contamination in your data.
I concur with genomax. Did you run FastQC on the fastq files? It's likely that only about 18% of your reads are usable if both STAR and bowtie2 agree. Depending on what FastQC says, you may be able to rescue some more reads by adapter trimming.
I will check this, thank you for your help!
can you post the entire command you're using and the log file output?
STAR --genomeDir /genomes/dm6/Sequence/STARindex --runThreadN 8 --readFilesIn in.fastq --outSAMtype BAM SortedByCoordinate --outFileNamePrefix star_out