Can't load reference sequence from file 'GRCh37.fa': Unexpected character 'M' found.
1
1
Entering edit mode
4.0 years ago
Kai_Qi ▴ 130

Hi all:

I am trying to do an analysis using GRCh37.fa as reference genome. After running command

pureclip -i aligned.f.duplRm.pooled.R2.bam -bai aligned.f.duplRm.pooled.R2.bam.bai -g GRCh37.fa -iv "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;X;Y;" -nt 10 -o PureCLIP.crosslink_sites.bed

I received an error:

ERROR: Can't load reference sequence from file 'GRCh37.fa': Unexpected character 'M' found. 

I got advice from the developer as:

The problem is coming from an external library which is used and which expects the reference sequence to contain only the letters 'A', 'C', 'G', 'T' or 'N'. I know it is not ideal, but if you convert all non-ACGTs to Ns, the problem should be solved

Does anyone can teach me how to convert all non-ACGT to Ns so that I will be able to give it a try?

Thanks,

next-gen-sequencing RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

It is indeed strange that your reference contains the letter M. As a first step I would double check that the reference contains nucleotides and not amino acids. Once you are sure that this is the case you can use pyfasta of some other tool depending in which programming language you are proficient.

ADD REPLY
0
Entering edit mode

It is a reference genome. Thank you for your comment. I used a differential reference genome and it generated bed file. Nevertheless, I could not see peaks when load the bed to IGV. I guess I need to ask around.

Thanks,

ADD REPLY
0
Entering edit mode
2.8 years ago

Hi Kai, Got the same error in Pureclip using the reference genome fasta file downloaded from UCSC. To get the clean reference genome, you can try this:

cat hg38as.fa | sed '/^\>/! s/[RYKMSWBVHD]/N/g' > hg38as_clean.fa

It basically replaces any letters in the character set [RYKMSWBVHD] with an N, only in the lines that are not the header aka do not start with >. This fixed the Pureclip error for me.

ADD COMMENT
0
Entering edit mode

Thank you, I will try it, but I ended up using ctk toolkit to get it work.

ADD REPLY

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6