Entering edit mode
2.3 years ago
schlogl
▴
160
Hi there I am looking for some information about using bedtools nuc in many fasta files. I have a script like this:
basefolder="Results/GC_slidewindow"
width=1000
for assembly in $(cat list_assemblies)
do
echo "Calculating nucleotide content from ${assembly}"
bedtools nuc \
-fi Genomic_data/$1/$2/${assembly}/Chromosome/*_chr.fna \
-bed ${basefolder}/${assembly}_${width}bps.bed \
> ${basefolder}/$1/$2/${assembly}_nuc_${width}.txt
done
But I couldn't find any information in the bedtool documentation if the tool accepts wildcards, because I am getting back this error:
Calculating nucleotide content from GCA_021513715.1
*****ERROR: Unrecognized parameter: *****
Tool: bedtools nuc (aka nucBed)
Version: v2.30.0
Summary: Profiles the nucleotide content of intervals in a fasta file.
Usage: bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-s Profile the sequence according to strand.
-seq Print the extracted sequence
-pattern Report the number of times a user-defined sequence
is observed (case-sensitive).
-C Ignore case when matching -pattern. By defaulty, case matters.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab is used.
Output format:
The following information will be reported after each BED entry:
1) %AT content
2) %GC content
3) Number of As observed
4) Number of Cs observed
5) Number of Gs observed
6) Number of Ts observed
7) Number of Ns observed
8) Number of other bases observed
9) The length of the explored sequence/interval.
10) The seq. extracted from the FASTA file. (opt., if -seq is used)
11) The number of times a user's pattern was observed.
(opt., if -pattern is used.)
So I am looking for any help or tip to get the job done. Thank you all for your time. Paulo
PS- list of assembly is like ( they are different genomes ):
GCA_008000775.1
GCA_008000770.1
.
.
.
concatenate your fasta files into one fasta file for each assembly
@Pierre Lindenbaum Sorry I copy and paste the same assembly Id, but they are all different genomes. Thank you