Entering edit mode
2.7 years ago
epi_vincent
▴
10
Hi,
I am trying to convert ChIP peak coordinates (.bed output from Pepr or sicer) to fasta sequences for motif analysis using RSAT.
To that end, I am using faidx from samtools.
Here is the head of my bed file that contains the coordinates from which I want to extract the sequences:
head peaks.bed
#Chromosome Start End
NC_029516.1 16800 20599
NC_029516.1 21200 22799
This is the command I typed to extract the sequences from NCBI quail genome:
samtools faidx GCF_001577835.2_Coturnix_japonica_2.1_genomic.fna -r peaks.bed -o peaks.fna
However, I get this error message:
[W::fai_get_val] Reference #Chromosome Start End not found in file, returning empty sequence
[faidx] Failed to fetch sequence in #Chromosome Start End
For information, this is the head of my genome:
head GCF_001577835.2_Coturnix_japonica_2.1_genomic.fna
>NC_029516.1 Coturnix japonica isolate 7356 chromosome 1, Coturnix japonica 2.1, whole genome shotgun sequence
GAGCTGTTggagggtccagaggaggcgCTGAgatgggcagagctggagcagctccgctatgaggaaaggctggggGAGAT
GGGATGGTTTAGATTGGAGAAGGATGCGGGGAGACCTCAGTGTGGGCatccaatacttgaagggagtgtataaacaggag
Can you help me solving the issue?
Thanks!
Vincent
Thanks, however the issue is still there
I wonder if this has sommeting to do with the format of my bed file, as it is tab spaced and not using : and - signs. If that is the issue, any way to do the conversion? (help of faidx: -r, --region-file FILE Read regions from a file. Format is chr:from-to, one per line)
Right, that must be the issue, you can try something like this and see if it works:
This worked for me:
If I used "\t" it did put :- at the end of each line. I guess columns are space delimited and not tab delimited. And this worked! I could extract my genomic regions using faidx.
Thank you very much!