PARalyzer error: * is not found in the .2bit file
0
1
Entering edit mode
5.4 years ago
xiaoleiusc ▴ 140

Dear Biostars forum users,

I tried to process my PAR-CLIP dataset with PARalyzer version 1.5 ( https://ohlerlab.mdc-berlin.de/software/PARalyzer_85/ ). While I could start running the program without any problem, after several hours I always got an error as below:

Running PARalyzer v1.5
Parsing SAM file(s)...Done
Creating Read Groups & Clusters...Exception in thread "main" java.lang.Exception: * is not found in the .2bit file
at MyTwoBitParser.loadChromosome(MyTwoBitParser.java:58)
at MyTwoBitParser.getSequence(MyTwoBitParser.java:118)
at PARalyze.main(PARalyze.java:204)

I generated by 2bit file by faToTwoBit tools (e.g. faToTwoBit hg19.fasta hg19.2bit).

I really appreciate any input to solve this issue.

Best,

Xiao

CLIP-Seq • 1.8k views
ADD COMMENT
1
Entering edit mode

In SAM/BAM files, typically * is used as chromosome name to indicate unmapped reads. Can you share the command line?

ADD REPLY
0
Entering edit mode

strong textHi, ATpoint,

Thanks a lot for your reply. I am sorry for my late reply to your question (I am new here and I did not get email notification somehow for messages). I share my command line which leads to error as below:

(base) bieniaszs-ipro:hg19_NL4_3 bieniaszlab$ PARAlyzer 128G hnRNPU_NL43.ini
Running PARalyzer v1.5
Parsing SAM file(s)...Done
Creating Read Groups & Clusters...Exception in thread "main" java.lang.Exception: * is not found in the .2bit file
at MyTwoBitParser.loadChromosome(MyTwoBitParser.java:58)
at MyTwoBitParser.getSequence(MyTwoBitParser.java:118)
at PARalyze.main(PARalyze.java:204)

My 2bit file was generated by Fatotwobit tool of human hg19, I did not use filter file and my .ini file is as below:

BANDWIDTH=3
CONVERSION=T>C
MINIMUM_READ_COUNT_PER_GROUP=5
MINIMUM_READ_COUNT_PER_CLUSTER=5
MINIMUM_READ_COUNT_FOR_KDE=5
MINIMUM_CLUSTER_SIZE=11
MINIMUM_CONVERSION_LOCATIONS_FOR_CLUSTER=1
MINIMUM_CONVERSION_COUNT_FOR_CLUSTER=1
MINIMUM_READ_COUNT_FOR_CLUSTER_INCLUSION=5
MINIMUM_READ_LENGTH=1
MAXIMUM_NUMBER_OF_NON_CONVERSION_MISMATCHES=0

EXTEND_BY_READ

#ADDITIONAL_NUCLEOTIDES_BEYOND_SIGNAL=20

SAM_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43.sort.sam=COLLAPSED
GENOME_2BIT_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/NL43.2bit

OUTPUT_DISTRIBUTIONS_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43_distribution.csv
OUTPUT_GROUPS_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43_groups.csv
OUTPUT_CLUSTERS_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43_clusters.csv

Regards,
Xiao

ADD REPLY
0
Entering edit mode

Hi, ATpoint,

I really appreciate your stimulating input! I found that I have to use sam files with only mapped reads as input in the Paralyzer to make it work. I need to do samtools view -b -F 4 input.bam > output_mapped.bam to generate bam files with only mapped reads and then convert the bam to sam file by samtools view -h output_mapped.bam > output_mapped.sam to generate sam file that is working with Paralyzer. As you mentioned that in SAM/BAM files, typically * is used as chromosome name to indicate unmapped reads. This is likely giving me a problem in my Paralyzer run. My Paralyzer runs well with sam files with only mapped reads!

ADD REPLY
0
Entering edit mode

I had the same issue and this solved it. Even though I ran Bowtie with --no-unal, I guess there were still some unmapped reads in there.

ADD REPLY
0
Entering edit mode

Did you get the same error with the pre-built filter files provided on this tool's webiste?

We also provide the filter files for human assembly hg19 here and mouse assembly mm9 here.

ADD REPLY
0
Entering edit mode

Hi, Sej,

I found that I have to use sam files with only mapped reads as input in the Paralyzer to make it work. I need to do

samtools view -b -F 4 input.bam > output_mapped.bam

to generate bam files with only mapped reads and then convert the bam to sam file by

samtools view -h output_mapped.bam > output_mapped.sam

to generate sam file that is working with Paralyzer.

Thanks for your input.

Xiao

ADD REPLY
0
Entering edit mode

Hi, Xiao

When I ran PARalyzer, it didn't work with "=COLLAPSED" at the end of SAM_FILE lines but only work without "=COLLAPSED". I collapsed the fastq files with both fastx_toolkit and CIMS/fastq2collapse.pl but both didn't work.

Did you have any experience about this problem ?

Best, Seokju

ADD REPLY
0
Entering edit mode

Hi, Seokju,

Sorry for the late reply but I just saw your message today. I did not include =COLLAPSED in my PARalyzer ini file. I use fastx_toolkit to collapse reads. Here is one of my ini file for PARalyzer below:

BANDWIDTH=3
CONVERSION=T>C
MINIMUM_READ_COUNT_PER_GROUP=10
MINIMUM_READ_COUNT_PER_CLUSTER=5
MINIMUM_READ_COUNT_FOR_KDE=5
MINIMUM_CLUSTER_SIZE=15
MINIMUM_CONVERSION_LOCATIONS_FOR_CLUSTER=2
MINIMUM_CONVERSION_COUNT_FOR_CLUSTER=2
MINIMUM_READ_COUNT_FOR_CLUSTER_INCLUSION=5
MINIMUM_READ_LENGTH=1
MAXIMUM_NUMBER_OF_NON_CONVERSION_MISMATCHES=1

EXTEND_BY_READ

GENOME_2BIT_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/Index/hg19/hg19.2bit
SAM_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/Nextseq_Midoutput/CLIP17_synNCp15_NC43FSFS_hg19_m2.sorted.sam
OUTPUT_DISTRIBUTIONS_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/CLIP17_synNCp15_NC43FSFS_hg19_m2_distri.csv
OUTPUT_GROUPS_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/Nextseq_Midoutput/CLIP17_synNCp15_NC43FSFS_hg19_m2_group.csv
OUTPUT_CLUSTERS_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/Nextseq_Midoutput/CLIP17_synNCp15_NC43FSFS_hg19_m2_clusters.csv
ADD REPLY

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6