Quoted content

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Quoted content

Daryl Stultz
Hello,


I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.


I'm using Commons CSV 1.7 and Open CSV 4.6.


Here is a code snippet with OpenCSV and Commons CSV:


import com.opencsv.CSVReader;

import org.apache.commons.csv.CSVFormat;

import org.apache.commons.csv.CSVParser;

System.out.println("OpenCSV");
CSVReader reader = new CSVReader(new StringReader("1, 2, \"A, B\", 4"));
reader.iterator().forEachRemaining((sa) -> {
  Arrays.stream(sa).forEach((s) -> System.out.println(s.trim()));
});

System.out.println("Commons CSV");
CSVParser parser = CSVFormat.DEFAULT.withTrim().parse(new StringReader("1, 2, \"A, B\", 4"));
parser.iterator().next().iterator().forEachRemaining(System.out::println);

The output is:


OpenCSV

1

2

A, B

4

Commons CSV

1

2

"A

B"

4


My expectation is that OpenCSV and Commons CSV would yield the same results (which would also agree with the library I'm yanking out).


I've tried fiddling with settings and with different CSVFormat instances with no change in behavior.


Any help appreciated.


--

Daryl Stultz
Principal Software Developer
_____________________________________
OpenTempo, Inc
http://www.opentempo.com<http://www.opentempo.com/>
mailto:[hidden email]<mailto:[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

Daryl Stultz
I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.

I've discovered this bug here:
https://issues.apache.org/jira/browse/CSV-228

<https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.

There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.

--

Daryl Stultz
Principal Software Developer
_____________________________________
OpenTempo, Inc
http://www.opentempo.com<http://www.opentempo.com/>
mailto:[hidden email]<mailto:[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

sebb-2-2
On Fri, 14 Jun 2019 at 13:34, Daryl Stultz <[hidden email]> wrote:
>
> I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.
>
> I've discovered this bug here:
> https://issues.apache.org/jira/browse/CSV-228
>
> <https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.

According to my reading of RFC4180[1], the fields between delimiters
are either either escaped or non-escaped.
non-escaped fields can include spaces, but not comma
escaped fields must start with the double-quote; leading spaces are
not permitted.

[1] https://tools.ietf.org/html/rfc4180

> There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.

Now that is the case for DEFAULT and RFC4180.

I've not looked into the withTrim() option.
If that is supposed to trim before handling quoted fields, then I
agree that there seems to be a bug here.
But if the trim is only supposed to apply to the un-quoted field, then
the current behaviour seems OK, even if it's not what you expect.

> --
>
> Daryl Stultz
> Principal Software Developer
> _____________________________________
> OpenTempo, Inc
> http://www.opentempo.com<http://www.opentempo.com/>
> mailto:[hidden email]<mailto:[hidden email]>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

sebb-2-2
Try using:

withIgnoreSurroundingSpaces()

On Fri, 14 Jun 2019 at 14:03, sebb <[hidden email]> wrote:

>
> On Fri, 14 Jun 2019 at 13:34, Daryl Stultz <[hidden email]> wrote:
> >
> > I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.
> >
> > I've discovered this bug here:
> > https://issues.apache.org/jira/browse/CSV-228
> >
> > <https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.
>
> According to my reading of RFC4180[1], the fields between delimiters
> are either either escaped or non-escaped.
> non-escaped fields can include spaces, but not comma
> escaped fields must start with the double-quote; leading spaces are
> not permitted.
>
> [1] https://tools.ietf.org/html/rfc4180
>
> > There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.
>
> Now that is the case for DEFAULT and RFC4180.
>
> I've not looked into the withTrim() option.
> If that is supposed to trim before handling quoted fields, then I
> agree that there seems to be a bug here.
> But if the trim is only supposed to apply to the un-quoted field, then
> the current behaviour seems OK, even if it's not what you expect.
>
> > --
> >
> > Daryl Stultz
> > Principal Software Developer
> > _____________________________________
> > OpenTempo, Inc
> > http://www.opentempo.com<http://www.opentempo.com/>
> > mailto:[hidden email]<mailto:[hidden email]>
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

sebb-2-2
I should have added:

withIgnoreSurroundingSpaces() affects parsing
withTrim() affects printing.

On Fri, 14 Jun 2019 at 15:05, sebb <[hidden email]> wrote:

>
> Try using:
>
> withIgnoreSurroundingSpaces()
>
> On Fri, 14 Jun 2019 at 14:03, sebb <[hidden email]> wrote:
> >
> > On Fri, 14 Jun 2019 at 13:34, Daryl Stultz <[hidden email]> wrote:
> > >
> > > I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.
> > >
> > > I've discovered this bug here:
> > > https://issues.apache.org/jira/browse/CSV-228
> > >
> > > <https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.
> >
> > According to my reading of RFC4180[1], the fields between delimiters
> > are either either escaped or non-escaped.
> > non-escaped fields can include spaces, but not comma
> > escaped fields must start with the double-quote; leading spaces are
> > not permitted.
> >
> > [1] https://tools.ietf.org/html/rfc4180
> >
> > > There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.
> >
> > Now that is the case for DEFAULT and RFC4180.
> >
> > I've not looked into the withTrim() option.
> > If that is supposed to trim before handling quoted fields, then I
> > agree that there seems to be a bug here.
> > But if the trim is only supposed to apply to the un-quoted field, then
> > the current behaviour seems OK, even if it's not what you expect.
> >
> > > --
> > >
> > > Daryl Stultz
> > > Principal Software Developer
> > > _____________________________________
> > > OpenTempo, Inc
> > > http://www.opentempo.com<http://www.opentempo.com/>
> > > mailto:[hidden email]<mailto:[hidden email]>
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

Daryl Stultz


> withIgnoreSurroundingSpaces() affects parsing
> withTrim() affects printing.

Ah, that is exactly what I needed, withIgnoreSurroundingSpaces() solves my problem. (Definitely hard to understand that which applies to parsing and that which applies to printing!)

Thank you so much.

/Daryl


Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

garydgregory
I've never like mashing formatting and parsing options together. Should we
have CSVFormat subclasses called CSVPrintingFormat and CSVParsingFormat?

Gary

On Fri, Jun 14, 2019 at 10:24 AM Daryl Stultz <[hidden email]>
wrote:

>
>
> > withIgnoreSurroundingSpaces() affects parsing
> > withTrim() affects printing.
>
> Ah, that is exactly what I needed, withIgnoreSurroundingSpaces() solves my
> problem. (Definitely hard to understand that which applies to parsing and
> that which applies to printing!)
>
> Thank you so much.
>
> /Daryl
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Quoted content

Remko Popma-2
Also check out https://github.com/osiegmar/FastCSV

(Shameless plug) Every java main() method deserves http://picocli.info

> On Jun 14, 2019, at 23:53, Gary Gregory <[hidden email]> wrote:
>
> I've never like mashing formatting and parsing options together. Should we
> have CSVFormat subclasses called CSVPrintingFormat and CSVParsingFormat?
>
> Gary
>
> On Fri, Jun 14, 2019 at 10:24 AM Daryl Stultz <[hidden email]>
> wrote:
>
>>
>>
>>> withIgnoreSurroundingSpaces() affects parsing
>>> withTrim() affects printing.
>>
>> Ah, that is exactly what I needed, withIgnoreSurroundingSpaces() solves my
>> problem. (Definitely hard to understand that which applies to parsing and
>> that which applies to printing!)
>>
>> Thank you so much.
>>
>> /Daryl
>>
>>
>>