[jira] [Created] (COMPRESS-170) Improve robustness when the wrong encoding was passed into ZipArchiveInputStream / ZipFile

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (COMPRESS-170) Improve robustness when the wrong encoding was passed into ZipArchiveInputStream / ZipFile

ASF GitHub Bot (Jira)
Improve robustness when the wrong encoding was passed into ZipArchiveInputStream / ZipFile
------------------------------------------------------------------------------------------

                 Key: COMPRESS-170
                 URL: https://issues.apache.org/jira/browse/COMPRESS-170
             Project: Commons Compress
          Issue Type: Improvement
          Components: Archivers
    Affects Versions: 1.3
            Reporter: Trejkaz


If a zip file is in one encoding and I try to read it using a different encoding, what I expect to happen is that the filenames get garbled but the data otherwise extracts correctly (which is what I see using native tools to extract a zip file in this fashion.)

However, what Commons Compress can do is to try and decode a name, fail, and ultimately give us no zip entries to work with.

Here's a test to show what I mean:

    @Test
    public void testWrongEncodingPassedIn() throws Exception {
        // Making the test zip file:
        File inputFile = new File(scratch, "test.dat");
        byte[] inputData = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
        FileUtils.writeByteArrayToFile(inputFile, inputData);
        File file = new File(scratch, "test.zip");
        try (ZipArchiveOutputStream out = new ZipArchiveOutputStream(file)) {
            out.setEncoding("windows-31j");
            ZipArchiveEntry entry = new ZipArchiveEntry(inputFile, "\u767A\u8D77\u4EBA\u6C7A\u5B9A\u66F8");
            out.putArchiveEntry(entry);
            out.write(inputData);
            out.closeArchiveEntry();
        }

        // Trying to iterate over it:
        int entryCount = 0;
        try (ZipArchiveInputStream in = new ZipArchiveInputStream(new FileInputStream(file), "windows-1252", false)) {
            ZipArchiveEntry entry = in.getNextZipEntry();
            if (entry == null) {
                break;
            }
            entryCount++;
        }

        assertEquals("Wrong number of entries", 1, entryCount);
    }

In this situation it's definitely the caller's "fault", but unfortunately the end user is often the one supplying the encoding and they would rather see garbled contents with the actual data intact, than no data at all.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (COMPRESS-170) Improve robustness when the wrong encoding was passed into ZipArchiveInputStream / ZipFile

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183103#comment-13183103 ]

Trejkaz commented on COMPRESS-170:
----------------------------------

Argh, my code missed out a while(true) loop. :(  What I get for having no working clipboard between my development computer and my posting computer.

               

> Improve robustness when the wrong encoding was passed into ZipArchiveInputStream / ZipFile
> ------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-170
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-170
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>    Affects Versions: 1.3
>            Reporter: Trejkaz
>
> If a zip file is in one encoding and I try to read it using a different encoding, what I expect to happen is that the filenames get garbled but the data otherwise extracts correctly (which is what I see using native tools to extract a zip file in this fashion.)
> However, what Commons Compress can do is to try and decode a name, fail, and ultimately give us no zip entries to work with.
> Here's a test to show what I mean:
>     @Test
>     public void testWrongEncodingPassedIn() throws Exception {
>         // Making the test zip file:
>         File inputFile = new File(scratch, "test.dat");
>         byte[] inputData = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
>         FileUtils.writeByteArrayToFile(inputFile, inputData);
>         File file = new File(scratch, "test.zip");
>         try (ZipArchiveOutputStream out = new ZipArchiveOutputStream(file)) {
>             out.setEncoding("windows-31j");
>             ZipArchiveEntry entry = new ZipArchiveEntry(inputFile, "\u767A\u8D77\u4EBA\u6C7A\u5B9A\u66F8");
>             out.putArchiveEntry(entry);
>             out.write(inputData);
>             out.closeArchiveEntry();
>         }
>         // Trying to iterate over it:
>         int entryCount = 0;
>         try (ZipArchiveInputStream in = new ZipArchiveInputStream(new FileInputStream(file), "windows-1252", false)) {
>             ZipArchiveEntry entry = in.getNextZipEntry();
>             if (entry == null) {
>                 break;
>             }
>             entryCount++;
>         }
>         assertEquals("Wrong number of entries", 1, entryCount);
>     }
> In this situation it's definitely the caller's "fault", but unfortunately the end user is often the one supplying the encoding and they would rather see garbled contents with the actual data intact, than no data at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira