Set the insert size threshold in STAR
0
0
Entering edit mode
6.2 years ago

I have mapped some reads using STAR:

STAR --runThreadN 18 --genomeDir $HOME/Doct2.0/Genomes/Ustilago/STAR_index/ --readFilesIn $HOME/Doct2.0/Data/ax_3/ax3_1_paired.fastq $HOME/Doct2.0/Data/ax_3/ax3_2_paired.fastq --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 40

However I have a problem visualizing the data in IGV, I'm getting several reads with huge insert sizes. I tried to solve it by setting in -4 the --scoreInsOpen --scoreInsBase parameters, I thought in this way I will penalize long inserts but I got the same results.

I'm on my first steps in RNA-Seq analysis and I don't know how to proceed (I know how to eliminate those reads after the mapping, but I have a lot more data to analyze and I think it would be better if I could solve the problem while mapping), so if you know how to set an insert size threshold in STAR would be great.

Thank you!

Sin_t_tulo

STAR RNA-Seq • 2.4k views
ADD COMMENT
1
Entering edit mode

To clarify you don't want STAR to map reads that exceed a certain threshold insert size? Those alignments may represent real splice events.

ADD REPLY
0
Entering edit mode

But is it normal have so many of those? If it is I'll certainly keep them, even if that is not the aim of my work (just a master's work)

Sin_t_tulo2

ADD REPLY
1
Entering edit mode

Have you right clicked on the display in the middle of display of alignments and chosen "show reads as pairs"? There are many option there to explore as to how IGV displays those alignments.

ADD REPLY
0
Entering edit mode

I did, anyways I still have the same doubt, I feel like there are many spliced reads. However, I saw that genes have good coverage along all the chromosomes and that is what I was looking for, so I guess it is enough if combined with the HISAT2 results. For now, I will go forward with the pipeline and if I have time I will come back to study those splices.

Thank you!

ADD REPLY
1
Entering edit mode

so I guess it is enough if combined with the HISAT2 results.

Why would you want to combine results from two aligners (for the same sample)?

Looks like you are working with a fungal genome so splicing should be expected. Image posted above is for a much longer region so it is hard to tell what the reads are doing. IGV can also show you splice junctions graphically.

ADD REPLY
0
Entering edit mode

I meant to compare, and yes I have been talking with some co-workers and they said the same

ADD REPLY

Login before adding your answer.

Traffic: 1967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6