[jira] Created: (SANDBOX-308) The CSVPrinter ecapsing inconsistant with CSVParser

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SANDBOX-308) The CSVPrinter ecapsing inconsistant with CSVParser

ASF GitHub Bot (Jira)
The CSVPrinter ecapsing inconsistant with CSVParser
---------------------------------------------------

                 Key: SANDBOX-308
                 URL: https://issues.apache.org/jira/browse/SANDBOX-308
             Project: Commons Sandbox
          Issue Type: Bug
          Components: CSV
            Reporter: Colin Goodheart-Smithe
            Priority: Minor


The CSVPrinter ecapses new line and return character to "\n" and "\r" if these occur within the encapsulators (this is within the CSVPrinter.escapeAndQuote(String) method).  However, the CSVParser do not convert these back to new line and return characters in the same fashion.  So if you use the CSVPrinter to create a delimited file containing new line or return characters within an entry and then read this file using the CSVParser the text read in by the CSVParser will not match the text written by the CSVPrinter (the difference being that every new line and return character will be replaced by "\n" and "\r" respectively).

A possible fix for this would be to add two extra 'else if' statements to CSVParser.encapsulatedTokenLexer(Token, int) starting at line 49, as detailed below (the _ehampsised_ text indicated the changes):

else if (c == '\\' && in.lookAhead() == '\\')
                {
                    // doubled escape char, it does not escape itself, only encapsulator
                    // -> add both escape chars to stream
                    tkn.content.append((char) c);
                    c = in.read();
                    tkn.content.append((char) c);
                }
                _else if (c == '\\' && in.lookAhead() == 'n')_
                _{_
                   _ // escaped java new line character, append a new line character_
                    _tkn.content.append('\n');_
                    _c = in.read();_
                _}_
                _else if (c == '\\' && in.lookAhead() == 'r')_
                _{_
                 _// escaped java return character, append a return character_
                    _tkn.content.append('\r');_
                    _c = in.read();_
                _}_
                else if (strategy.getUnicodeEscapeInterpretation() && c == '\\'
                        && in.lookAhead() == 'u')
                {
                    // interpret unicode escaped chars (like \u0070 -> p)
                    tkn.content.append((char) unicodeEscapeLexer(c));
                }


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SANDBOX-308) The CSVPrinter ecapsing inconsistant with CSVParser

ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Colin Goodheart-Smithe updated SANDBOX-308:
-------------------------------------------

    Attachment: CSVPrintTest.java

This JUnit test illustrates the bug in this issue.

> The CSVPrinter ecapsing inconsistant with CSVParser
> ---------------------------------------------------
>
>                 Key: SANDBOX-308
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-308
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Colin Goodheart-Smithe
>            Priority: Minor
>         Attachments: CSVParser.java, CSVPrintTest.java
>
>
> The CSVPrinter ecapses new line and return character to "\n" and "\r" if these occur within the encapsulators (this is within the CSVPrinter.escapeAndQuote(String) method).  However, the CSVParser do not convert these back to new line and return characters in the same fashion.  So if you use the CSVPrinter to create a delimited file containing new line or return characters within an entry and then read this file using the CSVParser the text read in by the CSVParser will not match the text written by the CSVPrinter (the difference being that every new line and return character will be replaced by "\n" and "\r" respectively).
> A possible fix for this would be to add two extra 'else if' statements to CSVParser.encapsulatedTokenLexer(Token, int) starting at line 49, as detailed below (the _ehampsised_ text indicated the changes):
> else if (c == '\\' && in.lookAhead() == '\\')
>                 {
>                     // doubled escape char, it does not escape itself, only encapsulator
>                     // -> add both escape chars to stream
>                     tkn.content.append((char) c);
>                     c = in.read();
>                     tkn.content.append((char) c);
>                 }
>                 _else if (c == '\\' && in.lookAhead() == 'n')_
>                 _{_
>                    _ // escaped java new line character, append a new line character_
>                     _tkn.content.append('\n');_
>                     _c = in.read();_
>                 _}_
>                 _else if (c == '\\' && in.lookAhead() == 'r')_
>                 _{_
>                  _// escaped java return character, append a return character_
>                     _tkn.content.append('\r');_
>                     _c = in.read();_
>                 _}_
>                 else if (strategy.getUnicodeEscapeInterpretation() && c == '\\'
>                         && in.lookAhead() == 'u')
>                 {
>                     // interpret unicode escaped chars (like \u0070 -> p)
>                     tkn.content.append((char) unicodeEscapeLexer(c));
>                 }

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SANDBOX-308) The CSVPrinter ecapsing inconsistant with CSVParser

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Colin Goodheart-Smithe updated SANDBOX-308:
-------------------------------------------

    Attachment: CSVParser.java

Possible fix for this issue

> The CSVPrinter ecapsing inconsistant with CSVParser
> ---------------------------------------------------
>
>                 Key: SANDBOX-308
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-308
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Colin Goodheart-Smithe
>            Priority: Minor
>         Attachments: CSVParser.java, CSVPrintTest.java
>
>
> The CSVPrinter ecapses new line and return character to "\n" and "\r" if these occur within the encapsulators (this is within the CSVPrinter.escapeAndQuote(String) method).  However, the CSVParser do not convert these back to new line and return characters in the same fashion.  So if you use the CSVPrinter to create a delimited file containing new line or return characters within an entry and then read this file using the CSVParser the text read in by the CSVParser will not match the text written by the CSVPrinter (the difference being that every new line and return character will be replaced by "\n" and "\r" respectively).
> A possible fix for this would be to add two extra 'else if' statements to CSVParser.encapsulatedTokenLexer(Token, int) starting at line 49, as detailed below (the _ehampsised_ text indicated the changes):
> else if (c == '\\' && in.lookAhead() == '\\')
>                 {
>                     // doubled escape char, it does not escape itself, only encapsulator
>                     // -> add both escape chars to stream
>                     tkn.content.append((char) c);
>                     c = in.read();
>                     tkn.content.append((char) c);
>                 }
>                 _else if (c == '\\' && in.lookAhead() == 'n')_
>                 _{_
>                    _ // escaped java new line character, append a new line character_
>                     _tkn.content.append('\n');_
>                     _c = in.read();_
>                 _}_
>                 _else if (c == '\\' && in.lookAhead() == 'r')_
>                 _{_
>                  _// escaped java return character, append a return character_
>                     _tkn.content.append('\r');_
>                     _c = in.read();_
>                 _}_
>                 else if (strategy.getUnicodeEscapeInterpretation() && c == '\\'
>                         && in.lookAhead() == 'u')
>                 {
>                     // interpret unicode escaped chars (like \u0070 -> p)
>                     tkn.content.append((char) unicodeEscapeLexer(c));
>                 }

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SANDBOX-308) The CSVPrinter ecapsing inconsistant with CSVParser

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved SANDBOX-308.
----------------------------------

    Resolution: Fixed

I believe this was fixed by SANDBOX-322

> The CSVPrinter ecapsing inconsistant with CSVParser
> ---------------------------------------------------
>
>                 Key: SANDBOX-308
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-308
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Colin Goodheart-Smithe
>            Priority: Minor
>         Attachments: CSVParser.java, CSVPrintTest.java
>
>
> The CSVPrinter ecapses new line and return character to "\n" and "\r" if these occur within the encapsulators (this is within the CSVPrinter.escapeAndQuote(String) method).  However, the CSVParser do not convert these back to new line and return characters in the same fashion.  So if you use the CSVPrinter to create a delimited file containing new line or return characters within an entry and then read this file using the CSVParser the text read in by the CSVParser will not match the text written by the CSVPrinter (the difference being that every new line and return character will be replaced by "\n" and "\r" respectively).
> A possible fix for this would be to add two extra 'else if' statements to CSVParser.encapsulatedTokenLexer(Token, int) starting at line 49, as detailed below (the _ehampsised_ text indicated the changes):
> else if (c == '\\' && in.lookAhead() == '\\')
>                 {
>                     // doubled escape char, it does not escape itself, only encapsulator
>                     // -> add both escape chars to stream
>                     tkn.content.append((char) c);
>                     c = in.read();
>                     tkn.content.append((char) c);
>                 }
>                 _else if (c == '\\' && in.lookAhead() == 'n')_
>                 _{_
>                    _ // escaped java new line character, append a new line character_
>                     _tkn.content.append('\n');_
>                     _c = in.read();_
>                 _}_
>                 _else if (c == '\\' && in.lookAhead() == 'r')_
>                 _{_
>                  _// escaped java return character, append a return character_
>                     _tkn.content.append('\r');_
>                     _c = in.read();_
>                 _}_
>                 else if (strategy.getUnicodeEscapeInterpretation() && c == '\\'
>                         && in.lookAhead() == 'u')
>                 {
>                     // interpret unicode escaped chars (like \u0070 -> p)
>                     tkn.content.append((char) unicodeEscapeLexer(c));
>                 }

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.