Bug with accent in filename when export to compressed XML
When exporting to Compressed XML (.MXL), if the filename contains accents, le resulting file cannot be imported by MuseScore or any other software.
Checking the mxl file with 7-zip shows that the accent in filename are converted to invalid characters.
Software:
Musescore version 1.3
Windows 7 sp1 with all mandatory updates applied
Attachment | Size |
---|---|
Bug MXL caractères.mxl | 1.62 KB |
Bug MXL caractères.mscz | 1.68 KB |
Comments
The mscz and the mxl file have files in them which use the same 'invalid' characters, I guess it uses UTF8 for Encoding?
But indeed MuseScore 1.3 can't Import the mxl file ("unexpected end of file at line 1 column 1"), but can open the mscz
It can, however, Import the extracted xml file, so it doesn't look to be a filename Problem.
A current nightly build does have the same problem
Quick investigation shows at least two issues:
1) The filename (as stored in the mxl file) is encoded in UTF-8 but the language encoding flag (EFS) is not set. This probably explains the invalid characters in zip readers. It is caused by a QZipWriter bug: the version used in MuseScore never sets this flag.
2) File META-INF/container.xml (a required component of the mxl file) states it is encoded in UTF-8 but actually encodes the e with accent grave in the filename as 0xE8 instead of 0xC3 0xA8, which is invalid UTF-8. This is a MuseScore bug. I expect this results in the MuseScore MusicXML importer being unable to extract the MusicXML file, as it cannot correctly interpret the filename.
see also #21910: Unicode character in filename messes the mscz file
Do we really need to store the actual name of the file in container.xml and in the zip file? Couldn't we use an ASCII name like
score.xml
or replace non ascii letters to ASCII equivalent (transliteration) or even"?".1) I think QZipWriter and QZipReader can't cope with UTF-8 in filename on all platforms.
2) Opening container.xml in a text editor like PSPad reports an ANSI encoding... MuseScore does set the codec to UTF-8 but doesn't specify a BOM... https://github.com/musescore/MuseScore/blob/master/libmscore/xml.cpp#L2… I do think that MuseScore can read back the filename correctly but then it can't match it to the extracted filename because of 1)
Further testing shows 1) is not an issue, Finale does not set the language encoding flag either. Furthermore the MusicXML spec explicitly states that filenames in the mxl file must be encoded in UTF-8.
The mxl file as is (as exported by MuseScore 1.3 and the current trunk) cannot be imported in Finale, MuseScore or Sibelius. By fixing 2) only, the file imports OK in all three. See attached fixed file.
As other MusicXML producers export mxl files with UTF-8 encoded filenames, MuseScore should be (and currently is !) able to import these files correctly.
1/ Unfortunately, there is no reason why the filename would be encoded in UTF-8 in the zip file since QZipWriter is using toLocal8bit https://github.com/musescore/MuseScore/blob/master/libmscore/qzip.cpp#L…. I believe this is bad :( and will cause inter OS compatibility issues... As Leon stated MusicXML spec requires UTF-8 encoding filenames http://www.musicxml.com/tutorial/compressed-mxl-files/compressed-file-f…
2/ If I create a UTF8 file with a text editor (SublimeText2) with a "é" and save it as UTF-8, it doesn't get encoded to 0xC3 0xA8 but 0xE8... MuseScore might indeed have a bug when writing the container file. It seems that setting the codec before setting the device on a QTextStream doesn't work well.
Despite the bug, I believe MuseScore is reading container.xml correctly but QZipReader fails to locate an entry because the entry are not in UTF-8
In addition, on Windows, it's a hell to debug because it's likely that at runtime MuseScore is using the QZip* classes from Qt and not the one from MuseScore...
I made a proposal fix in this pull request https://github.com/musescore/MuseScore/pull/422
You can test it with
git checkout -b lasconic-zipunicode master
git pull git@github.com:lasconic/MuseScore.git zipunicode
Feedback is very welcome. Does it work on your OS? Does it work on your files? 1.3 files?
it should also fix #21910: Unicode character in filename messes the mscz file
To enhance the confusion ( :-) ) yesterday I rebuilt the current trunk (revision reported by "git rev-parse --short HEAD" as 4495b25) on Linux and cannot find incorrect file names anymore. A Cyrillic capital letter DE in the name correctly gets encoded as 0xD0 0x94, both in the local file header in the mxl file and in container.xml. This file imports OK in MuseScore.
This is WITHOUT pull request 422.
My previous experiments were on Mac using an older version of the trunk (several weeks old, still using Qt 4.8).
Tried again on Mac, this time using the current trunk: works OK.
It does't open in Nightly build 35d5429 on Windows 7
Fixed in 12ca08e743
My pull request is merged. Please try it and report if you find any issue. Especially on mac and linux...
Automatically closed -- issue fixed for 2 weeks with no activity.