Question

FastX toolkit - problems with Collapser

0

Entering edit mode

6.5 years ago

jcarlosariute • 0

Hello,

I have been trying to use FastX Toolkit's Collapser on my RNA-seq data. However, the collapsed outfiles are coming all empty. Has anybody ever had this problem?

EDITED

Considering that Collapser it's not used any longer, what would be the next steps after merging my files?

Thank you all

RNA-Seq fastX collapser • 2.4k views

ADD COMMENT • link 6.5 years ago by jcarlosariute • 0

1

Entering edit mode

That is not appropriate. You want to count reads so collapsing them would defeat the purpose.

ADD REPLY • link 6.5 years ago by GenoMax 148k

0

Entering edit mode

what are the error messages ? what is the output of

module x && module load fastx-toolkit/0.0.14  && cd "$SCRATCH/dir" && fastx_collapser -v -i mergedR1_file.fastq > /dev/null

ADD REPLY • link 6.5 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

The outputfiles were comming all empty. It didn't even show me an specific error code. They were just empty.

ADD REPLY • link 6.5 years ago by jcarlosariute • 0

0

Entering edit mode

What exactly do you want to do? Fastx_toolkit is an ancient tool that does not well support paired-end data (or actually does not suport it at all). Give some details on your aim so that we can direct you to a better tool.

ADD REPLY • link 6.5 years ago by ATpoint 86k

0

Entering edit mode

Thank you for help. I would like to remove the repeated reads of the same transcripts to assemble the transcriptome (at least that's what I thought of).

ADD REPLY • link 6.5 years ago by jcarlosariute • 0

0

Entering edit mode

And why would you do that?

ADD REPLY • link 6.5 years ago by ATpoint 86k

0

Entering edit mode

Considering that Collapser it's not used any longer, what would be the next steps after merging my files?

You need to explain what you are trying to do. Why did you merge the files (did you mean to say you concatenated files)? Insert sizes for RNAseq libraries are generally in a range where even the longest possible Illumina reads should not allow R1/R2 reads to merge/overlap.

Normally, one would take RNAseq data, scan/trim it as needed, align with a splice-aware aligner (if you expect splicing) and then the aligned reads are counted using featureCounts/htseq-count to generate raw counts that are then fed into DESeq2 for diff exp analysis.

If you are deviating from these steps then you need to have a good reason to do so.

ADD REPLY • link 6.5 years ago by GenoMax 148k

score 1 · Answer 1 · 2018-07-09

1

Entering edit mode

6.5 years ago

michael.ante ★ 3.9k

Try the undocumented -Q33 option. The fastx toolkit is quite old and uses per default the phred 64 encoding. FastQ files are now encoded in phred 33.

Cheers,

Michael

ADD COMMENT • link 6.5 years ago by michael.ante ★ 3.9k