[jira] Created: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
determine name for TarEntries with special characters in TarUtils.parseName
---------------------------------------------------------------------------

                 Key: COMPRESS-114
                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
             Project: Commons Compress
          Issue Type: Bug
    Affects Versions: 1.0
         Environment: Windows/Suse
            Reporter: Helmut Minst


if a tarfile contains files with special characters, the names of the tar entries are wrong.

example:
correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model

please use:
result.append(new String(new byte[] { buffer[i] }));

instead of:
result.append((char) buffer[i]);

to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868206#action_12868206 ]

Sebb commented on COMPRESS-114:
-------------------------------

Could you provide a small sample tar file containing some files with the special names?
[Perhaps the file contents could be the expected name]

We can then add the file to the test cases.

Thanks!

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868211#action_12868211 ]

Sebb commented on COMPRESS-114:
-------------------------------

Forgot to mention:

The "String(byte[])" constructor depends on the default charset encoding, which might not always be what is wanted.

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Helmut Minst updated COMPRESS-114:
----------------------------------

    Attachment: plusMinusForJIRA.tar

tarfile which includes such files with a filename containing special characters

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRA.tar
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868217#action_12868217 ]

Helmut Minst commented on COMPRESS-114:
---------------------------------------

that's right, but the cast from byte to char is not a beautiful way.

String charsetName = "ISO-8859-1";
                try {
                    result.append(new String(new byte[] { buffer[i] }, charsetName));
                } catch (UnsupportedEncodingException e) {
                    result.append(new String(new byte[] { buffer[i] }));
                }

where charsetName may be set via a system property or sth else by the customer of commons compress.



> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRA.tar
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Helmut Minst updated COMPRESS-114:
----------------------------------

    Attachment: TarArchiveEntry.java
                TarArchiveInputStream.java
                TarUtils.java

example where charset may be set by a setter in TarArchiveInpustream.
if an UnsupportedEncodingException occurs, the default charset of the system
is used.

Please have a look. Hope this helps to solve the problem.


> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRA.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868220#action_12868220 ]

Sebb commented on COMPRESS-114:
-------------------------------

Thanks for the test case - unfortunately you did not grant a license to the ASF to use it.
Could you re-attach it with the option selected please?

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRA.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Helmut Minst updated COMPRESS-114:
----------------------------------

    Attachment: plusMinusForJIRAwithLicense.tar

same file, but now the license is available

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Helmut Minst updated COMPRESS-114:
----------------------------------

    Attachment:     (was: plusMinusForJIRA.tar)

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868225#action_12868225 ]

Sebb commented on COMPRESS-114:
-------------------------------

As to how to determine the charset, it looks as though "ASCII" or "ISO-8859-1" are suitable as the default.
Gnu docs mention "local variant of ASCII".

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868232#action_12868232 ]

Helmut Minst commented on COMPRESS-114:
---------------------------------------

yes, i think "ISO-8859-1" seems to be the suitable default value for the charset, too.

i've tested it with several special characters

äöü in german for example

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868238#action_12868238 ]

Sebb commented on COMPRESS-114:
-------------------------------

I think there may also be a problem with the TarUtils.formatNameBytes() method, which assumes that String.charAt() can be stored in a byte.

I'll add some round-trip tests for formatNameBytes() / parseName()

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) determine name for TarEntries with special characters in TarUtils.parseName

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868244#action_12868244 ]

Sebb commented on COMPRESS-114:
-------------------------------

Turned out to be easy to fix the round-trip problem - just ensure that the byte entries are treated as unsigned.
So no need to worry about charsets.

Still need to check that this works OK when reading from the test tar file.

> determine name for TarEntries with special characters in TarUtils.parseName
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb updated COMPRESS-114:
--------------------------

    Summary: TarUtils.parseName does not properly handle characters outside the range 0-127  (was: determine name for TarEntries with special characters in TarUtils.parseName)

> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb resolved COMPRESS-114.
---------------------------

    Fix Version/s: 1.1
       Resolution: Fixed

Now fixed; the test tar file reads OK

> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>             Fix For: 1.1
>
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868256#action_12868256 ]

Helmut Minst commented on COMPRESS-114:
---------------------------------------

I've had a look on your solution. This is a better way to solve this Problem.
Thanks a lot!

> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut Minst
>             Fix For: 1.1
>
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895204#action_12895204 ]

Pavel commented on COMPRESS-114:
--------------------------------

Hello,


I've checked out the trunk from http://svn.apache.org/repos/asf/commons/proper/compress and run the testRoundTripNames() test from TarUtilsTest. It failed (the last checkName() call with spec. characters). The test was performed on Ubuntu 8.10.







> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut M.
>             Fix For: 1.1
>
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895204#action_12895204 ]

Pavel edited comment on COMPRESS-114 at 8/4/10 6:38 AM:
--------------------------------------------------------

Hello,


I've checked out the trunk from http://svn.apache.org/repos/asf/commons/proper/compress and run the testRoundTripNames() test from TarUtilsTest. It failed (the last checkName() call with spec. characters). The test was performed on Ubuntu 8.10.

Has the fix been tested on Linux? In which version can find the final fix to this special characters problem?

Thanks





      was (Author: partysan):
    Hello,


I've checked out the trunk from http://svn.apache.org/repos/asf/commons/proper/compress and run the testRoundTripNames() test from TarUtilsTest. It failed (the last checkName() call with spec. characters). The test was performed on Ubuntu 8.10.






 

> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut M.
>             Fix For: 1.1
>
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895219#action_12895219 ]

Stefan Bodewig commented on COMPRESS-114:
-----------------------------------------

The test passes for me using Ubuntu 10.4 and OpenJDK 6 - I guess it may even more depend on the Java VM than the OS.  Which flavor of Java are you using, Pavel?

> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut M.
>             Fix For: 1.1
>
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (COMPRESS-114) TarUtils.parseName does not properly handle characters outside the range 0-127

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/COMPRESS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895222#action_12895222 ]

Pavel commented on COMPRESS-114:
--------------------------------

Hi Stefan,

thanks for a swift reply!

I'm using Sun JDK 1.6.0_06, if it helps

thx

> TarUtils.parseName does not properly handle characters outside the range 0-127
> ------------------------------------------------------------------------------
>
>                 Key: COMPRESS-114
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-114
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.0
>         Environment: Windows/Suse
>            Reporter: Helmut M.
>             Fix For: 1.1
>
>         Attachments: plusMinusForJIRAwithLicense.tar, TarArchiveEntry.java, TarArchiveInputStream.java, TarUtils.java
>
>
> if a tarfile contains files with special characters, the names of the tar entries are wrong.
> example:
> correct name: 0302-0601-3±±±F06±W220±ZB±LALALA±±±±±±±±±±CAN±±DC±±±04±060302±MOE.model
> name resolved by TarUtils.parseName: 0302-0101-3ᄆᄆᄆF06ᄆW220ᄆZBᄆHECKMODULᄆᄆᄆᄆᄆᄆᄆᄆᄆᄆECEᄆᄆDCᄆᄆᄆ07ᄆ060302ᄆDOERN.model
> please use:
> result.append(new String(new byte[] { buffer[i] }));
> instead of:
> result.append((char) buffer[i]);
> to solve this encoding problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

12