Skip to content

File/Dir - cannot use path made of non-utf8 bytestrings #79

@gabriel-v

Description

@gabriel-v

All the code in redun.File and friends assumes we have a single valid utf-8 string for the path.

But python accepts bytes as the path objects too. This is needed when we're working with filesystems that encode filenames using something else than UTF-8.

There's some functions that crash when trying to give File a bytes path:

  • File: get_filesystem_class() - get_proto() - the urlparse method fails on non-utf8 byte strings
  • Dir: all of the above, and also concatenating the glob pattern - complains that TypeError: Can't mix strings and bytes in path components

The workaround is to hack:

I also tried changing the self.classes.File but it can't be overwritten (uses getitem) - so one would have to replace this whole FileClasses thing.

I think one of two things can be done here:

  • either fix File, Dir and friends to work with non-uft8 bytestrings paths
  • or, allow the user of the library to override the FileClasses, get_filesystem_class and friends, without so much monkeypatching
  • refactor the whole thing to only use pathlib.Path as requested in Use pathlib.Path instead of strings for path #8
    • through I think the get_proto() and urlparse would still crash when given non-utf8 bytestrings

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions