converting filename charset

categories: oneliner

So I was copying files on a vfat-formatted usb stick when this error popped up:

Invalid or incomplete multibyte or wide character

Same issue with cp, rsync and tar. So I just thought this would be an issue with stupid vfat so I mkfs.ntfs the drive and tried again - same issue.

So while I thought that at least NTFS would allow any character except NULL I was proven wrong. Looking it up in wikipedia indeed yielded that ntfs file names are limited to UTF-16 code points - in contrast to ext2/3/4, reiserfs, Btrfs, XFS and surely even more which allow any byte except NULL.

So what my files contained was a number of iso 8859-1 or latin1 characters like 0xe8 (è), 0xe0 (à), 0xf9 (ù) or 0xec (ì). It might also have been Windows-1252 but apparently the bytes between 0x80 and 0x9F didnt appear.

This is how to finally batch-convert character sets in filenames:

convmv -f latin1 -t utf8 --notest *.flac
View Comments
blog comments powered by Disqus