Hi,
I was trying to calculate the total read length of all the sample present in bioproject in command utility as:
esearch -db bioproject -query "PRJNA438426" | efetch -format docsum | xtract -pattern DocumentSummary -element RunTotalBases
which is giving me no output.
In another way
grep -c "^@" SRR6943205_1.fastq SRR6943205_2.fastq
SRR6943205_1.fastq:18361521
SRR6943205_2.fastq:18361521
This is giving me the desired output but only when they are downloaded in fastq format in my system.
I have downloaded the SRA data in fastq format long ago, but they are already compressed (.fastq.gz). So in order to calculate the read length using second code snippet, I have to gunzip them all which is time as well as resource consuming for me now.
third option is to check every single fastqc html output of each sample individually which is also time taking.
I would greatly appreciate if someone could help me get me design a linux command to get my desired output.
Many thanks!
Thank you for providing this code. Its really help me. Is there any way to count GC% of the sample using its accession?
To calculate
GC%
, you can do-