converting filename charset
Thu, 08 Sep 2011 06:41 categories: onelinerSo I was copying files on a vfat-formatted usb stick when this error popped up:
Invalid or incomplete multibyte or wide character
Same issue with cp, rsync and tar. So I just thought this would be an issue with stupid vfat so I mkfs.ntfs the drive and tried again - same issue.
So while I thought that at least NTFS would allow any character except NULL I was proven wrong. Looking it up in wikipedia indeed yielded that ntfs file names are limited to UTF-16 code points - in contrast to ext2/3/4, reiserfs, Btrfs, XFS and surely even more which allow any byte except NULL.
So what my files contained was a number of iso 8859-1 or latin1 characters like 0xe8 (è), 0xe0 (à), 0xf9 (ù) or 0xec (ì). It might also have been Windows-1252 but apparently the bytes between 0x80 and 0x9F didnt appear.
This is how to finally batch-convert character sets in filenames:
convmv -f latin1 -t utf8 --notest *.flac