Looking for a example of using bedtools nuc in many fasta files
1
0
Entering edit mode
2.3 years ago
schlogl ▴ 160

Hi there I am looking for some information about using bedtools nuc in many fasta files. I have a script like this:

basefolder="Results/GC_slidewindow"
width=1000


for assembly in $(cat list_assemblies)
do
  echo "Calculating nucleotide content from ${assembly}"
  bedtools nuc \
  -fi Genomic_data/$1/$2/${assembly}/Chromosome/*_chr.fna \
  -bed ${basefolder}/${assembly}_${width}bps.bed \
  > ${basefolder}/$1/$2/${assembly}_nuc_${width}.txt
done

But I couldn't find any information in the bedtool documentation if the tool accepts wildcards, because I am getting back this error:

Calculating nucleotide content from GCA_021513715.1

*****ERROR: Unrecognized parameter:   *****


Tool:    bedtools nuc (aka nucBed)
Version: v2.30.0
Summary: Profiles the nucleotide content of intervals in a fasta file.

Usage:   bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options: 
    -fi Input FASTA file

    -bed    BED/GFF/VCF file of ranges to extract from -fi

    -s  Profile the sequence according to strand.

    -seq    Print the extracted sequence

    -pattern    Report the number of times a user-defined sequence
            is observed (case-sensitive).

    -C  Ignore case when matching -pattern. By defaulty, case matters.

    -fullHeader Use full fasta header.
        - By default, only the word before the first space or tab is used.

Output format: 
    The following information will be reported after each BED entry:
        1) %AT content
        2) %GC content
        3) Number of As observed
        4) Number of Cs observed
        5) Number of Gs observed
        6) Number of Ts observed
        7) Number of Ns observed
        8) Number of other bases observed
        9) The length of the explored sequence/interval.
        10) The seq. extracted from the FASTA file. (opt., if -seq is used)
        11) The number of times a user's pattern was observed.
            (opt., if -pattern is used.)

So I am looking for any help or tip to get the job done. Thank you all for your time. Paulo

PS- list of assembly is like ( they are different genomes ):

GCA_008000775.1
GCA_008000770.1
.
.
.
bedtools • 1.1k views
ADD COMMENT
0
Entering edit mode

concatenate your fasta files into one fasta file for each assembly

ADD REPLY
0
Entering edit mode

@Pierre Lindenbaum Sorry I copy and paste the same assembly Id, but they are all different genomes. Thank you

ADD REPLY
1
Entering edit mode
2.3 years ago
schlogl ▴ 160

I just did some adjustments in some directories paths in the code and then it worked as expected. Thank you all. Paulo

ADD COMMENT

Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6