It says my file is empty when I do this:
(jespinoz_env) -bash-4.1$ reformat.sh in=mapped_interleaved.fastq out=mapped_interleaved_validated.fastq vpair
java -ea -Xmx200m -cp /usr/local/devel/ANNOTATION/jespinoz/anaconda/envs/mage_env/opt/bbmap-38.22-1/current/ jgi.ReformatReads in=mapped_interleaved.fastq out=mapped_interleaved_validated.fastq vpair
Executing jgi.ReformatReads [in=mapped_interleaved.fastq, out=mapped_interleaved_validated.fastq, vpair]
Input is being processed as paired
Writing interleaved.
Names do not appear to be correctly paired.
NS500647:155:H2MFYBGX2:1:11101:17655:15675:N:0:CACGCAAT#0/1
NS500647:155:H2MFYBGX2:1:11101:7822:15982:N:0:CACGCAAT#0/2
(jespinoz_env) -bash-4.1$ ls
mapped_interleaved.fastq mapped_interleaved.stats.txt mapped_interleaved_validated.fastq
(jespinoz_env) -bash-4.1$ ls -lhtr
total 12M
-rw-r--r-- 1 jespinoz tigr 10M Mar 27 01:25 mapped_interleaved.fastq
-rw-r--r-- 1 jespinoz tigr 2.1K Mar 27 01:25 mapped_interleaved.stats.txt
-rw-r--r-- 1 jespinoz tigr 0 Mar 27 14:14 mapped_interleaved_validated.fastq
There's definitely paired reads in here when I do it in python:
In [1]: import pandas as pd; from collections import defaultdict
In [2]: data = defaultdict(int)
...: with open("./mapped_interleaved.fastq", "r") as f:
...: for line in f.readlines():
...: if line.startswith("@"):
...: line = line.strip()
...: id_read = line[:-2]
...: data[id_read] += 1
...: pd.Series(data).value_counts().sort_values()
...:
...:
...:
Out[2]:
2 3571
1 22170
dtype: int64
Here's an example of my reads:
....: head mapped_interleaved.fastq
....:
@NS500647:155:H2MFYBGX2:1:11105:7084:14986:N:0:CGCAACTA#0/1
ATTTTCTCCAAGTCTGTATGCTCATCTTCGATGGTTAAAGTAGCATGGCGCATGTTAGCATCTGTTAAGGCATCCATAAAACCACTTGCCCGCTCAATGCGAGTACTCAAACGACTCGTATCCGCTGTAATCAAGAGGAAATGCTCGTAAC
+
AAAAAEEEEEEEEEEEEEEEEEEEEAEEEAEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAAE/EEEEEEE/EEEEE<EEEAEAEEEEAE6EEEEEAEEEEEEEAEEE6EEEEEE<E/EEEEEEEEEAEEEEEEEE<6<EEE
@NS500647:155:H2MFYBGX2:1:11105:12921:14384:N:0:CGCAACTA#0/2
ACTAGGAGCAGCCCCCGTCAAATCTCCAACGCCCACAGCAGATAGGGACCAAACTGTCTCACGACGTTTTAAACCCAGCTCACGTACCTCTTTAAATGGCGAACAGCCATACCCTTGGGACCGGCTACAGCCCCAGGATGAGATGAGCCG
+
AAAAAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEA6AEEEEEEEEEEEEEEEEEEEEEEEAEEE/EAEAEEEA/EAEE<AEE/AEEEA<EAEA<AA<E<<AEAA
@NS500647:155:H2MFYBGX2:1:11104:26079:14394:N:0:CGCAACTA#0/1
GGTCATGAGGGGGACTCGTGTGATAAGGCAGCCTGAAATGGGATTGAGTGTTTATTTCAGGCTGCCTTGGGGGGTGTGAAGTGGGCGTGGTCATTGGATGAAGGCAGCCTGCGTAGCGAAGC