Question

Column lengths differ error (ArchR)

0

Entering edit mode

3.0 years ago

bioinformatics.girl ▴ 20

I keep receiving the error when running reformatFragmentReads() for ArchR: Detected 1 column names bu the data has 5 columns. Added 4 extra default column names at the end.

How would you fix this? For reference, I am working with .tsv.gz files.

Here's the error I face:

enter image description here

EDIT on 15-Jun-2022

I have been having the same error pop up for the past 5 days and have not found any helpful solution to this. Currently I am simply passing a function reformatFragmentFiles("fragments.tsv.gz"), and I only receive the following message:

Column 1 is length 1283934 which differs from length of column 1 (0).

Can someone provide *helpful solutions to mitigate this issue? Is there a certain manipulation I need to pass on the file directly?

atac-seq cellranger r archr • 4.6k views

ADD COMMENT • link updated 3.0 years ago by GenoMax 151k • written 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

check your input file delimiter....

ADD REPLY • link 3.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

am working with .tsv.gz files

so the question is "does this software works with gz files... ?"

ADD REPLY • link 3.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

The software works with .gz files; I've also tried .tsv.gz, .tsv.gz.tbi, etc. It seems to be a formatting issue from the CellRanger derived fragment files unfortunately. And when I try to use the reformatFragmentReads(), I run into the same issue.

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

I alread did. I'm seeking solutions however (i.e. script for mitigating the issue)...

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

zgrep first few lines and check the separator. Separator must be in line with program supported separators.

ADD REPLY • link 3.0 years ago by cpad0112 21k

0

Entering edit mode

That command is only used to search for patterns; that would not fix the issue.

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

it's zcat. sorry for the typo. (However you can zgrep . a gzipped file to print every thing. Don't do it though).

ADD REPLY • link 3.0 years ago by cpad0112 21k

0

Entering edit mode

I wish zhead and ztail were builtins similar to zcat. Aliases can be created, of course, but I'm disappointed at the lack of builtins.

ADD REPLY • link 3.0 years ago by Ram 45k

0

Entering edit mode

enter image description here

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

1

Entering edit mode

Please copy and paste error messages when possible rather than using screenshots.

ADD REPLY • link 3.0 years ago by GenoMax 151k

0

Entering edit mode

Please do not delete posts that have received comments/answers. It is discourteous to people who have tried to help you.

This is also not an acceptable way to create new posts with the same question.

ADD REPLY • link 3.0 years ago by GenoMax 151k

GenoMax · Answer 1 · 2022-06-15

2

Entering edit mode

3.0 years ago

rpolicastro 13k

That function uses fread from the data.table library to load the fragments file into memory and expects 5 columns named V1 through V5. The column names will not be in the actual fragments file, but will be added as default column names when the data is loaded. Try manually loading your fragments file using fread and checking whether you see the correct data format.

*EDIT* solution is here - Column lengths differ error (ArchR)

EDIT2 (GenoMax) - Code to remove the first line is provided by @Ram click --> Column lengths differ error (ArchR)

ADD COMMENT • link updated 3.0 years ago by GenoMax 151k • written 3.0 years ago by rpolicastro 13k

0

Entering edit mode

In addition to this, like cpad0112 mentioned, try using zcat fragments.tsv.gz | head to look at the first ten lines of the content. If nothing looks suspicious, try zcat -A instead of plain zcat in the command above to see all invisible characters.

ADD REPLY • link 3.0 years ago by Ram 45k

0

Entering edit mode

zcat is to view the file's contents which I have already done and the contents look alright. I am asking how to bypass the error I keep viewing.

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

So now when I load it as reformatFragmentFiles(fragmentFiles = fread("/home/Downloads/fragments.tsv.gz")), it's an error. I think I see where you're coming from but not sure how I would be able to integrate fread inside readFragmentFiles. I have also tried renaming the file to a variable (i.e. a <- fread("/home/Downloads/fragments.tsv.gz") and then reformatFragmentFiles(a)), but this function needs a file path so I cannot use character vectors in that sense.

fread is called internally by the reformatFragmentReads function, meaning that you usually don't need to worry about it since the function will take care of running it. However, your error seems to be related to loading the file into memory, so the reason I want you to try and load the data in manually is to check whether this is being caused by the internal call to fread. If you post the results of the code below we can check whether it worked or not.

library("data.table")

DT <- fread("fragments.tsv.gz")

head(DT)

ADD REPLY • link 3.0 years ago by rpolicastro 13k

0

Entering edit mode

Here's the output:

enter image description here

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

1

Entering edit mode

The first row in your fragment file is just # primary_contig=JH584295 which is causing the problem. If you remove that row it should work.

ADD REPLY • link 3.0 years ago by rpolicastro 13k

0

Entering edit mode

Unfortunately not working for me; I did: $tail -n +2 fragments_104.tsv.gz > fragment_104_processed.tsv.gz and then passed it onto the same function, reformatFragmentReads(), just to receive the same error?

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

1

Entering edit mode

You cannot directly tail a gzipped file. Use

zcat fragments_104.tsv.gz | head -n 3

to check that the first couple of lines are weird in the expected manner, then use

zcat fragments_104.tsv.gz | tail -n +2 | gzip -c > fragments_104.tsv.first_line_removed.gz

and follow up with

~~reformatFragmentFiles(fragments_104.tsv.first_line_removed.gz)~~

reformatFragmentReads(fragments_104.tsv.first_line_removed.gz)

(I used first_line_removed in name instead of processed as it serves as a record of the processing.

Edited to change a wrong function name (reformatFragmentFiles should actually be reformatFragmentReads)

ADD REPLY • link 3.0 years ago by Ram 45k

0

Entering edit mode

I had already performed that, and the same error appears.

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

What is the output to:

zcat fragments_104.tsv.gz | head -n 3
zcat fragments_104.tsv.first_line_removed.gz | head -n 3

ADD REPLY • link 3.0 years ago by Ram 45k

0

Entering edit mode

# id = SAMPLE104
# description = 
#

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

There was an error in code before yesterday.

Instead of reformatFragmentFiles the code had reformatFragmentReads. If you had copy pasted the code then it would not have worked.

Can you confirm that you did try corrected code?

ADD REPLY • link 3.0 years ago by GenoMax 151k

0

Entering edit mode

Oops, my bad. Good catch, GenoMax! It should not give "the same error" though (barring an insane coincidence)

ADD REPLY • link 3.0 years ago by Ram 45k

0

Entering edit mode

Must have been a mistake from my part, but the correct function is reformatFragmentFiles()

ADD REPLY • link 3.0 years ago by bioinformatics.girl ▴ 20

0

Entering edit mode

You've been asked time and again to copy-paste plain text content instead of using screenshots. Are you having a problem using the site?

ADD REPLY • link 3.0 years ago by Ram 45k

score 1 · Answer 2 · 2022-06-17

1

Entering edit mode

3.0 years ago

Ram 45k

It looks like you have multiple comment lines up top. You're going to have to do something like:

zgrep -v "^#" fragments_104.tsv.gz | gzip -c > fragments_104.comment_lines_removed.tsv.gz

and then reformatFragmentFiles(fragments_104.comment_lines_removed.tsv.gz)

ADD COMMENT • link 3.0 years ago by Ram 45k

0

Entering edit mode

Tried this this morning and it solved everything! I used this command in terminal and then used the processed file for the reformatFragmentFiles() function in ArchR. Successfully ran the function in R.