The SRA archive format ("vdb") contains an md5 checksum as well as a few other consistency checks (I think). The sra-toolkit has a utility, vdb-validate which will report any errors in the data, and perform an md5 checksum comparison.
Try this one. When you didn't remove SRA files, you can use the following:
for i in `ls *.gz`;
do
SRR=${i%%_*};
echo $SRR >> list.txt;
done
for j in `sort -u list.txt`
do
vdb-validate $j
done
if you have already removed SRA and only kept fastq, and suppose your fastq is from this way:
fastq-dump --split-files --gzip SRR949210
Then you can try:
for i in `ls *fastq.gz`;
do
gunzip -t $i 2 > $i.err
done
find . -name "*err" -type f -size +0c -exec -larth {} \;
vdb-validate operates on the SRA archive. if you are streaming fastq from SRA using local toolkit install, you'll get an error if the process is interrupted. otherwise your output will be complete. you can verify by checking your output stats against the stats stored in SRA using sra-stat -x
(use --statistics
to get deeper detail including at readgroup level and quality score distribution). or check individually by looking at SRA RunBrowser which displays basic stats.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
OP has not been seen in 16 months. I request a mod to review the answers and pick accepted answers among them.
Thank you, GenoMax!
There are many mays to customize fastq output from SRA. So there is no single checksum that would be relevant