[csv] CSVFormat API names

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[csv] CSVFormat API names

garydgregory
Hi All:

The format object can configure various aspects of input and output
formatting.

With my recent addition of the Quote enum for [CSV-53], there are now two
aspects of quoting to configure: the quote character and the quote policy
(minimal, all, non-numeric, and none.) FYI, 'none' is currently not
implemented.

First, I changed (without consulting this list, and please accept my
apologies for this) the - IMO - cryptic and burdensome terminology of
"encapsulator" to "quote char", and added "quote policy":

- withQuoteChar(char)
- withQuotePolicy(Quote)

My intention here is that all Quote APIs start with "withQuote" followed by
what aspect of quoting is being configured.

Alternatively, we could have:

- withQuote(char)
- withQuotePolicy(Quote)

Which makes the API more consistent with the other char/Character based
properties:

- withEscape
- withDelimiter
- withLineSeparator
- withCommentStart

none of the above are post-fixed with a "Char" in the name.

As far as reading, for me, the "-r" names are OK because the they are nouns
(things): "a delimiter", "a line separator." But I do not talk about "an
escape" because that would be an act (think Alcatraz) as opposed to what we
have here: a character used to /perform/ escapes.

So I propose to change "escape" to "escape char" because "escaper" is not a
word.

The name "comment start" is not great also because it implies (to me) that
there is a "comment end" missing. So plain "comment" or "comment char"
would be better.

Circling back to "quote char" which I have the way it is now for the same
reason as for the "escape" property.

In summary, using *Char names is better IMO.

Discuss! :)

Gary

[CSV-53] https://issues.apache.org/jira/browse/CSV-53
--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Jörg Schaible-3
Hi Gary,

Gary Gregory wrote:

> Hi All:
>
> The format object can configure various aspects of input and output
> formatting.
>
> With my recent addition of the Quote enum for [CSV-53], there are now two
> aspects of quoting to configure: the quote character and the quote policy
> (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
> implemented.
>
> First, I changed (without consulting this list, and please accept my
> apologies for this) the - IMO - cryptic and burdensome terminology of
> "encapsulator" to "quote char", and added "quote policy":
>
> - withQuoteChar(char)
> - withQuotePolicy(Quote)
>
> My intention here is that all Quote APIs start with "withQuote" followed
> by what aspect of quoting is being configured.
>
> Alternatively, we could have:
>
> - withQuote(char)
> - withQuotePolicy(Quote)

or

- withQuote(char)
- withQuote(Quote)

;-)

> Which makes the API more consistent with the other char/Character based
> properties:
>
> - withEscape
> - withDelimiter
> - withLineSeparator
> - withCommentStart
>
> none of the above are post-fixed with a "Char" in the name.
>
> As far as reading, for me, the "-r" names are OK because the they are
> nouns (things): "a delimiter", "a line separator." But I do not talk about
> "an escape" because that would be an act (think Alcatraz) as opposed to
> what we have here: a character used to /perform/ escapes.
>
> So I propose to change "escape" to "escape char" because "escaper" is not
> a word.
>
> The name "comment start" is not great also because it implies (to me) that
> there is a "comment end" missing. So plain "comment" or "comment char"
> would be better.

Who said it has to be a single char?

.withEOLComment("//")


Same applies to the line separator:

.withLineSeparator("\n\r")

> Circling back to "quote char" which I have the way it is now for the same
> reason as for the "escape" property.
>
> In summary, using *Char names is better IMO.

Only if it can be a single char only. If it can either be a single char or a
String, I normally tend to use overloaded methods:

- withEOLComment(char)
- withEOLComment(CharSequence)

> Discuss! :)

Can or worms opened :))

- Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Simone Tripodi-2
+1 to Jörg, that would be my recommendation as well!

my 0.02 cents,
-Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


On Tue, Oct 16, 2012 at 3:14 PM, Jörg Schaible
<[hidden email]> wrote:

> Hi Gary,
>
> Gary Gregory wrote:
>
>> Hi All:
>>
>> The format object can configure various aspects of input and output
>> formatting.
>>
>> With my recent addition of the Quote enum for [CSV-53], there are now two
>> aspects of quoting to configure: the quote character and the quote policy
>> (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
>> implemented.
>>
>> First, I changed (without consulting this list, and please accept my
>> apologies for this) the - IMO - cryptic and burdensome terminology of
>> "encapsulator" to "quote char", and added "quote policy":
>>
>> - withQuoteChar(char)
>> - withQuotePolicy(Quote)
>>
>> My intention here is that all Quote APIs start with "withQuote" followed
>> by what aspect of quoting is being configured.
>>
>> Alternatively, we could have:
>>
>> - withQuote(char)
>> - withQuotePolicy(Quote)
>
> or
>
> - withQuote(char)
> - withQuote(Quote)
>
> ;-)
>
>> Which makes the API more consistent with the other char/Character based
>> properties:
>>
>> - withEscape
>> - withDelimiter
>> - withLineSeparator
>> - withCommentStart
>>
>> none of the above are post-fixed with a "Char" in the name.
>>
>> As far as reading, for me, the "-r" names are OK because the they are
>> nouns (things): "a delimiter", "a line separator." But I do not talk about
>> "an escape" because that would be an act (think Alcatraz) as opposed to
>> what we have here: a character used to /perform/ escapes.
>>
>> So I propose to change "escape" to "escape char" because "escaper" is not
>> a word.
>>
>> The name "comment start" is not great also because it implies (to me) that
>> there is a "comment end" missing. So plain "comment" or "comment char"
>> would be better.
>
> Who said it has to be a single char?
>
> .withEOLComment("//")
>
>
> Same applies to the line separator:
>
> .withLineSeparator("\n\r")
>
>> Circling back to "quote char" which I have the way it is now for the same
>> reason as for the "escape" property.
>>
>> In summary, using *Char names is better IMO.
>
> Only if it can be a single char only. If it can either be a single char or a
> String, I normally tend to use overloaded methods:
>
> - withEOLComment(char)
> - withEOLComment(CharSequence)
>
>> Discuss! :)
>
> Can or worms opened :))
>
> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

garydgregory
In reply to this post by Jörg Schaible-3
On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
<[hidden email]>wrote:

> Hi Gary,
>
> Gary Gregory wrote:
>
> > Hi All:
> >
> > The format object can configure various aspects of input and output
> > formatting.
> >
> > With my recent addition of the Quote enum for [CSV-53], there are now two
> > aspects of quoting to configure: the quote character and the quote policy
> > (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
> > implemented.
> >
> > First, I changed (without consulting this list, and please accept my
> > apologies for this) the - IMO - cryptic and burdensome terminology of
> > "encapsulator" to "quote char", and added "quote policy":
> >
> > - withQuoteChar(char)
> > - withQuotePolicy(Quote)
> >
> > My intention here is that all Quote APIs start with "withQuote" followed
> > by what aspect of quoting is being configured.
> >
> > Alternatively, we could have:
> >
> > - withQuote(char)
> > - withQuotePolicy(Quote)
>
> or
>
> - withQuote(char)
> - withQuote(Quote)
>
> ;-)
>

Darn, I wish I knew you better to know if you were joking! :)

This would not be good IMO because you are configuring two different
aspects of the behavior. When I see the same API name with different
parameters, I think that they are the same and that the API just does
conversions.

We could consider making Quote a class instead of an enum and have it carry
a char and an enum, such that one object defines all quoting aspects. This
might be too normalized a design for something so simple though.


>
> > Which makes the API more consistent with the other char/Character based
> > properties:
> >
> > - withEscape
> > - withDelimiter
> > - withLineSeparator
> > - withCommentStart
> >
> > none of the above are post-fixed with a "Char" in the name.
> >
> > As far as reading, for me, the "-r" names are OK because the they are
> > nouns (things): "a delimiter", "a line separator." But I do not talk
> about
> > "an escape" because that would be an act (think Alcatraz) as opposed to
> > what we have here: a character used to /perform/ escapes.
> >
> > So I propose to change "escape" to "escape char" because "escaper" is not
> > a word.
> >
> > The name "comment start" is not great also because it implies (to me)
> that
> > there is a "comment end" missing. So plain "comment" or "comment char"
> > would be better.
>
> Who said it has to be a single char?
>

The current implementation does. ;)

Are comments even in any RFC?


>
> .withEOLComment("//")
>
>
> Same applies to the line separator:
>
> .withLineSeparator("\n\r")
>
> > Circling back to "quote char" which I have the way it is now for the same
> > reason as for the "escape" property.
> >
> > In summary, using *Char names is better IMO.
>
> Only if it can be a single char only. If it can either be a single char or
> a
> String, I normally tend to use overloaded methods:
>
> - withEOLComment(char)
> - withEOLComment(CharSequence)
>

If you want to add // to the mix, please start a different thread. I'm not
sure this is really needed. Do you have a real life use case?

Merci!
Gary


>
> > Discuss! :)
>
> Can or worms opened :))
>
> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

garydgregory
In reply to this post by Jörg Schaible-3
On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
<[hidden email]>wrote:

> Hi Gary,
>
> Gary Gregory wrote:
>
> > Hi All:
> >
> > The format object can configure various aspects of input and output
> > formatting.
> >
> > With my recent addition of the Quote enum for [CSV-53], there are now two
> > aspects of quoting to configure: the quote character and the quote policy
> > (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
> > implemented.
> >
> > First, I changed (without consulting this list, and please accept my
> > apologies for this) the - IMO - cryptic and burdensome terminology of
> > "encapsulator" to "quote char", and added "quote policy":
> >
> > - withQuoteChar(char)
> > - withQuotePolicy(Quote)
> >
> > My intention here is that all Quote APIs start with "withQuote" followed
> > by what aspect of quoting is being configured.
> >
> > Alternatively, we could have:
> >
> > - withQuote(char)
> > - withQuotePolicy(Quote)
>
> or
>
> - withQuote(char)
> - withQuote(Quote)
>
> ;-)
>
> > Which makes the API more consistent with the other char/Character based
> > properties:
> >
> > - withEscape
> > - withDelimiter
> > - withLineSeparator
> > - withCommentStart
> >
> > none of the above are post-fixed with a "Char" in the name.
> >
> > As far as reading, for me, the "-r" names are OK because the they are
> > nouns (things): "a delimiter", "a line separator." But I do not talk
> about
> > "an escape" because that would be an act (think Alcatraz) as opposed to
> > what we have here: a character used to /perform/ escapes.
> >
> > So I propose to change "escape" to "escape char" because "escaper" is not
> > a word.
> >
> > The name "comment start" is not great also because it implies (to me)
> that
> > there is a "comment end" missing. So plain "comment" or "comment char"
> > would be better.
>
> Who said it has to be a single char?
>
> .withEOLComment("//")
>
>
> Same applies to the line separator:
>
> .withLineSeparator("\n\r")
>

My mistake there, I should not have mentioned this API. LineSeparator is
nice because it matches the line.separator system property name.

Gary


>
> > Circling back to "quote char" which I have the way it is now for the same
> > reason as for the "escape" property.
> >
> > In summary, using *Char names is better IMO.
>
> Only if it can be a single char only. If it can either be a single char or
> a
> String, I normally tend to use overloaded methods:
>
> - withEOLComment(char)
> - withEOLComment(CharSequence)
>
> > Discuss! :)
>
> Can or worms opened :))
>
> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Jörg Schaible-3
In reply to this post by garydgregory
Gary Gregory wrote:

> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
> <[hidden email]>wrote:
>
>> Hi Gary,
>>
>> Gary Gregory wrote:
>>
>> > Hi All:
>> >
>> > The format object can configure various aspects of input and output
>> > formatting.
>> >
>> > With my recent addition of the Quote enum for [CSV-53], there are now
>> > two aspects of quoting to configure: the quote character and the quote
>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is currently
>> > not implemented.
>> >
>> > First, I changed (without consulting this list, and please accept my
>> > apologies for this) the - IMO - cryptic and burdensome terminology of
>> > "encapsulator" to "quote char", and added "quote policy":
>> >
>> > - withQuoteChar(char)
>> > - withQuotePolicy(Quote)
>> >
>> > My intention here is that all Quote APIs start with "withQuote"
>> > followed by what aspect of quoting is being configured.
>> >
>> > Alternatively, we could have:
>> >
>> > - withQuote(char)
>> > - withQuotePolicy(Quote)
>>
>> or
>>
>> - withQuote(char)
>> - withQuote(Quote)
>>
>> ;-)
>>
>
> Darn, I wish I knew you better to know if you were joking! :)
>
> This would not be good IMO because you are configuring two different
> aspects of the behavior. When I see the same API name with different
> parameters, I think that they are the same and that the API just does
> conversions.
>
> We could consider making Quote a class instead of an enum and have it
> carry a char and an enum, such that one object defines all quoting
> aspects. This might be too normalized a design for something so simple
> though.

Actually I did not had a closer look to the API. You're definitely right to
use different names for different aspects. It does not make sense to
overload just for fun.

>
>
>>
>> > Which makes the API more consistent with the other char/Character based
>> > properties:
>> >
>> > - withEscape
>> > - withDelimiter
>> > - withLineSeparator
>> > - withCommentStart
>> >
>> > none of the above are post-fixed with a "Char" in the name.
>> >
>> > As far as reading, for me, the "-r" names are OK because the they are
>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>> about
>> > "an escape" because that would be an act (think Alcatraz) as opposed to
>> > what we have here: a character used to /perform/ escapes.
>> >
>> > So I propose to change "escape" to "escape char" because "escaper" is
>> > not a word.
>> >
>> > The name "comment start" is not great also because it implies (to me)
>> that
>> > there is a "comment end" missing. So plain "comment" or "comment char"
>> > would be better.
>>
>> Who said it has to be a single char?
>>
>
> The current implementation does. ;)
>
> Are comments even in any RFC?

Not that I am aware of.

>> .withEOLComment("//")
>>
>>
>> Same applies to the line separator:
>>
>> .withLineSeparator("\n\r")
>>
>> > Circling back to "quote char" which I have the way it is now for the
>> > same reason as for the "escape" property.
>> >
>> > In summary, using *Char names is better IMO.
>>
>> Only if it can be a single char only. If it can either be a single char
>> or a
>> String, I normally tend to use overloaded methods:
>>
>> - withEOLComment(char)
>> - withEOLComment(CharSequence)
>>
>
> If you want to add // to the mix, please start a different thread. I'm not
> sure this is really needed. Do you have a real life use case?

People come up with all kind of "solutions" they are used to. CSV is brittle
anyway, just because there is no "real" standard.

Cheers,
Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Matt Benson-2
Random thoughts--no real context here, so no way to inline:

- "line separator" concept, while harmonizing with the line.separator
system property, might be better represented as "row separator" so as
not to imply that the parameter should be in any way limited to \r or
\n .  I would think the default for this would be the line.separator
property, however, and thus should take a String or CharSequence
(perhaps it already does, but there's been so much talk about char
parameters...).

- with* methods:  just something to think about here, but while we're
creating a fluent API, would e.g. #delimitedBy('\t') read more
fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
#withEscape('\\') ?

$0.02,
Matt

On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
<[hidden email]> wrote:

> Gary Gregory wrote:
>
>> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>> <[hidden email]>wrote:
>>
>>> Hi Gary,
>>>
>>> Gary Gregory wrote:
>>>
>>> > Hi All:
>>> >
>>> > The format object can configure various aspects of input and output
>>> > formatting.
>>> >
>>> > With my recent addition of the Quote enum for [CSV-53], there are now
>>> > two aspects of quoting to configure: the quote character and the quote
>>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is currently
>>> > not implemented.
>>> >
>>> > First, I changed (without consulting this list, and please accept my
>>> > apologies for this) the - IMO - cryptic and burdensome terminology of
>>> > "encapsulator" to "quote char", and added "quote policy":
>>> >
>>> > - withQuoteChar(char)
>>> > - withQuotePolicy(Quote)
>>> >
>>> > My intention here is that all Quote APIs start with "withQuote"
>>> > followed by what aspect of quoting is being configured.
>>> >
>>> > Alternatively, we could have:
>>> >
>>> > - withQuote(char)
>>> > - withQuotePolicy(Quote)
>>>
>>> or
>>>
>>> - withQuote(char)
>>> - withQuote(Quote)
>>>
>>> ;-)
>>>
>>
>> Darn, I wish I knew you better to know if you were joking! :)
>>
>> This would not be good IMO because you are configuring two different
>> aspects of the behavior. When I see the same API name with different
>> parameters, I think that they are the same and that the API just does
>> conversions.
>>
>> We could consider making Quote a class instead of an enum and have it
>> carry a char and an enum, such that one object defines all quoting
>> aspects. This might be too normalized a design for something so simple
>> though.
>
> Actually I did not had a closer look to the API. You're definitely right to
> use different names for different aspects. It does not make sense to
> overload just for fun.
>
>>
>>
>>>
>>> > Which makes the API more consistent with the other char/Character based
>>> > properties:
>>> >
>>> > - withEscape
>>> > - withDelimiter
>>> > - withLineSeparator
>>> > - withCommentStart
>>> >
>>> > none of the above are post-fixed with a "Char" in the name.
>>> >
>>> > As far as reading, for me, the "-r" names are OK because the they are
>>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>>> about
>>> > "an escape" because that would be an act (think Alcatraz) as opposed to
>>> > what we have here: a character used to /perform/ escapes.
>>> >
>>> > So I propose to change "escape" to "escape char" because "escaper" is
>>> > not a word.
>>> >
>>> > The name "comment start" is not great also because it implies (to me)
>>> that
>>> > there is a "comment end" missing. So plain "comment" or "comment char"
>>> > would be better.
>>>
>>> Who said it has to be a single char?
>>>
>>
>> The current implementation does. ;)
>>
>> Are comments even in any RFC?
>
> Not that I am aware of.
>
>>> .withEOLComment("//")
>>>
>>>
>>> Same applies to the line separator:
>>>
>>> .withLineSeparator("\n\r")
>>>
>>> > Circling back to "quote char" which I have the way it is now for the
>>> > same reason as for the "escape" property.
>>> >
>>> > In summary, using *Char names is better IMO.
>>>
>>> Only if it can be a single char only. If it can either be a single char
>>> or a
>>> String, I normally tend to use overloaded methods:
>>>
>>> - withEOLComment(char)
>>> - withEOLComment(CharSequence)
>>>
>>
>> If you want to add // to the mix, please start a different thread. I'm not
>> sure this is really needed. Do you have a real life use case?
>
> People come up with all kind of "solutions" they are used to. CSV is brittle
> anyway, just because there is no "real" standard.
>
> Cheers,
> Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

garydgregory
On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <[hidden email]> wrote:

> Random thoughts--no real context here, so no way to inline:
>
> - "line separator" concept, while harmonizing with the line.separator
> system property, might be better represented as "row separator" so as
> not to imply that the parameter should be in any way limited to \r or
> \n .  I would think the default for this would be the line.separator
> property, however, and thus should take a String or CharSequence
> (perhaps it already does, but there's been so much talk about char
> parameters...).
>

Now that you mention it, this should have been obvious as soon as we wrote
the test cases where a record is split over more than one line.

There is a difference between line number and record number which the API
tracks.

I propose to change "line separator" to "record separator". The default can
be line.separator.

Gary


>
> - with* methods:  just something to think about here, but while we're
> creating a fluent API, would e.g. #delimitedBy('\t') read more
> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
> #withEscape('\\') ?
>
> $0.02,
> Matt
>
> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
> <[hidden email]> wrote:
> > Gary Gregory wrote:
> >
> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
> >> <[hidden email]>wrote:
> >>
> >>> Hi Gary,
> >>>
> >>> Gary Gregory wrote:
> >>>
> >>> > Hi All:
> >>> >
> >>> > The format object can configure various aspects of input and output
> >>> > formatting.
> >>> >
> >>> > With my recent addition of the Quote enum for [CSV-53], there are now
> >>> > two aspects of quoting to configure: the quote character and the
> quote
> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
> currently
> >>> > not implemented.
> >>> >
> >>> > First, I changed (without consulting this list, and please accept my
> >>> > apologies for this) the - IMO - cryptic and burdensome terminology of
> >>> > "encapsulator" to "quote char", and added "quote policy":
> >>> >
> >>> > - withQuoteChar(char)
> >>> > - withQuotePolicy(Quote)
> >>> >
> >>> > My intention here is that all Quote APIs start with "withQuote"
> >>> > followed by what aspect of quoting is being configured.
> >>> >
> >>> > Alternatively, we could have:
> >>> >
> >>> > - withQuote(char)
> >>> > - withQuotePolicy(Quote)
> >>>
> >>> or
> >>>
> >>> - withQuote(char)
> >>> - withQuote(Quote)
> >>>
> >>> ;-)
> >>>
> >>
> >> Darn, I wish I knew you better to know if you were joking! :)
> >>
> >> This would not be good IMO because you are configuring two different
> >> aspects of the behavior. When I see the same API name with different
> >> parameters, I think that they are the same and that the API just does
> >> conversions.
> >>
> >> We could consider making Quote a class instead of an enum and have it
> >> carry a char and an enum, such that one object defines all quoting
> >> aspects. This might be too normalized a design for something so simple
> >> though.
> >
> > Actually I did not had a closer look to the API. You're definitely right
> to
> > use different names for different aspects. It does not make sense to
> > overload just for fun.
> >
> >>
> >>
> >>>
> >>> > Which makes the API more consistent with the other char/Character
> based
> >>> > properties:
> >>> >
> >>> > - withEscape
> >>> > - withDelimiter
> >>> > - withLineSeparator
> >>> > - withCommentStart
> >>> >
> >>> > none of the above are post-fixed with a "Char" in the name.
> >>> >
> >>> > As far as reading, for me, the "-r" names are OK because the they are
> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
> >>> about
> >>> > "an escape" because that would be an act (think Alcatraz) as opposed
> to
> >>> > what we have here: a character used to /perform/ escapes.
> >>> >
> >>> > So I propose to change "escape" to "escape char" because "escaper" is
> >>> > not a word.
> >>> >
> >>> > The name "comment start" is not great also because it implies (to me)
> >>> that
> >>> > there is a "comment end" missing. So plain "comment" or "comment
> char"
> >>> > would be better.
> >>>
> >>> Who said it has to be a single char?
> >>>
> >>
> >> The current implementation does. ;)
> >>
> >> Are comments even in any RFC?
> >
> > Not that I am aware of.
> >
> >>> .withEOLComment("//")
> >>>
> >>>
> >>> Same applies to the line separator:
> >>>
> >>> .withLineSeparator("\n\r")
> >>>
> >>> > Circling back to "quote char" which I have the way it is now for the
> >>> > same reason as for the "escape" property.
> >>> >
> >>> > In summary, using *Char names is better IMO.
> >>>
> >>> Only if it can be a single char only. If it can either be a single char
> >>> or a
> >>> String, I normally tend to use overloaded methods:
> >>>
> >>> - withEOLComment(char)
> >>> - withEOLComment(CharSequence)
> >>>
> >>
> >> If you want to add // to the mix, please start a different thread. I'm
> not
> >> sure this is really needed. Do you have a real life use case?
> >
> > People come up with all kind of "solutions" they are used to. CSV is
> brittle
> > anyway, just because there is no "real" standard.
> >
> > Cheers,
> > Jörg
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

sebb-2-2
On 16 October 2012 16:29, Gary Gregory <[hidden email]> wrote:

> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <[hidden email]> wrote:
>
>> Random thoughts--no real context here, so no way to inline:
>>
>> - "line separator" concept, while harmonizing with the line.separator
>> system property, might be better represented as "row separator" so as
>> not to imply that the parameter should be in any way limited to \r or
>> \n .  I would think the default for this would be the line.separator
>> property, however, and thus should take a String or CharSequence
>> (perhaps it already does, but there's been so much talk about char
>> parameters...).
>>
>
> Now that you mention it, this should have been obvious as soon as we wrote
> the test cases where a record is split over more than one line.
>
> There is a difference between line number and record number which the API
> tracks.
>
> I propose to change "line separator" to "record separator". The default can
> be line.separator.

OK.
I prefer record to row.

> Gary
>
>
>>
>> - with* methods:  just something to think about here, but while we're
>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>> #withEscape('\\') ?
>>
>> $0.02,
>> Matt
>>
>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>> <[hidden email]> wrote:
>> > Gary Gregory wrote:
>> >
>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>> >> <[hidden email]>wrote:
>> >>
>> >>> Hi Gary,
>> >>>
>> >>> Gary Gregory wrote:
>> >>>
>> >>> > Hi All:
>> >>> >
>> >>> > The format object can configure various aspects of input and output
>> >>> > formatting.
>> >>> >
>> >>> > With my recent addition of the Quote enum for [CSV-53], there are now
>> >>> > two aspects of quoting to configure: the quote character and the
>> quote
>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>> currently
>> >>> > not implemented.
>> >>> >
>> >>> > First, I changed (without consulting this list, and please accept my
>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology of
>> >>> > "encapsulator" to "quote char", and added "quote policy":
>> >>> >
>> >>> > - withQuoteChar(char)
>> >>> > - withQuotePolicy(Quote)
>> >>> >
>> >>> > My intention here is that all Quote APIs start with "withQuote"
>> >>> > followed by what aspect of quoting is being configured.
>> >>> >
>> >>> > Alternatively, we could have:
>> >>> >
>> >>> > - withQuote(char)
>> >>> > - withQuotePolicy(Quote)
>> >>>
>> >>> or
>> >>>
>> >>> - withQuote(char)
>> >>> - withQuote(Quote)
>> >>>
>> >>> ;-)
>> >>>
>> >>
>> >> Darn, I wish I knew you better to know if you were joking! :)
>> >>
>> >> This would not be good IMO because you are configuring two different
>> >> aspects of the behavior. When I see the same API name with different
>> >> parameters, I think that they are the same and that the API just does
>> >> conversions.
>> >>
>> >> We could consider making Quote a class instead of an enum and have it
>> >> carry a char and an enum, such that one object defines all quoting
>> >> aspects. This might be too normalized a design for something so simple
>> >> though.
>> >
>> > Actually I did not had a closer look to the API. You're definitely right
>> to
>> > use different names for different aspects. It does not make sense to
>> > overload just for fun.
>> >
>> >>
>> >>
>> >>>
>> >>> > Which makes the API more consistent with the other char/Character
>> based
>> >>> > properties:
>> >>> >
>> >>> > - withEscape
>> >>> > - withDelimiter
>> >>> > - withLineSeparator
>> >>> > - withCommentStart
>> >>> >
>> >>> > none of the above are post-fixed with a "Char" in the name.
>> >>> >
>> >>> > As far as reading, for me, the "-r" names are OK because the they are
>> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>> >>> about
>> >>> > "an escape" because that would be an act (think Alcatraz) as opposed
>> to
>> >>> > what we have here: a character used to /perform/ escapes.
>> >>> >
>> >>> > So I propose to change "escape" to "escape char" because "escaper" is
>> >>> > not a word.
>> >>> >
>> >>> > The name "comment start" is not great also because it implies (to me)
>> >>> that
>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>> char"
>> >>> > would be better.
>> >>>
>> >>> Who said it has to be a single char?
>> >>>
>> >>
>> >> The current implementation does. ;)
>> >>
>> >> Are comments even in any RFC?
>> >
>> > Not that I am aware of.
>> >
>> >>> .withEOLComment("//")
>> >>>
>> >>>
>> >>> Same applies to the line separator:
>> >>>
>> >>> .withLineSeparator("\n\r")
>> >>>
>> >>> > Circling back to "quote char" which I have the way it is now for the
>> >>> > same reason as for the "escape" property.
>> >>> >
>> >>> > In summary, using *Char names is better IMO.
>> >>>
>> >>> Only if it can be a single char only. If it can either be a single char
>> >>> or a
>> >>> String, I normally tend to use overloaded methods:
>> >>>
>> >>> - withEOLComment(char)
>> >>> - withEOLComment(CharSequence)
>> >>>
>> >>
>> >> If you want to add // to the mix, please start a different thread. I'm
>> not
>> >> sure this is really needed. Do you have a real life use case?
>> >
>> > People come up with all kind of "solutions" they are used to. CSV is
>> brittle
>> > anyway, just because there is no "real" standard.
>> >
>> > Cheers,
>> > Jörg
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> E-Mail: [hidden email] | [hidden email]
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

garydgregory
In reply to this post by garydgregory
On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <[hidden email]>wrote:

> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <[hidden email]>wrote:
>
>> Random thoughts--no real context here, so no way to inline:
>>
>> - "line separator" concept, while harmonizing with the line.separator
>> system property, might be better represented as "row separator" so as
>> not to imply that the parameter should be in any way limited to \r or
>> \n .  I would think the default for this would be the line.separator
>> property, however, and thus should take a String or CharSequence
>> (perhaps it already does, but there's been so much talk about char
>> parameters...).
>>
>
> Now that you mention it, this should have been obvious as soon as we wrote
> the test cases where a record is split over more than one line.
>
> There is a difference between line number and record number which the API
> tracks.
>
> I propose to change "line separator" to "record separator". The default
> can be line.separator.
>
> Gary
>
>
>>
>> - with* methods:  just something to think about here, but while we're
>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>> #withEscape('\\') ?
>>
>
I find that the combination of the fluent API style AND immutability of the
format class ugly because of the PRISTINE & DISABLED internal crud.

Why not just have DEFAULT and dump PRISTINE? Other formats should be based
on DEFAULT.

With PRISTINE, the door is open for a future format to not override
DISABLED and create a bug, as unlikely as it is.

Gary




>> $0.02,
>> Matt
>>
>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>> <[hidden email]> wrote:
>> > Gary Gregory wrote:
>> >
>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>> >> <[hidden email]>wrote:
>> >>
>> >>> Hi Gary,
>> >>>
>> >>> Gary Gregory wrote:
>> >>>
>> >>> > Hi All:
>> >>> >
>> >>> > The format object can configure various aspects of input and output
>> >>> > formatting.
>> >>> >
>> >>> > With my recent addition of the Quote enum for [CSV-53], there are
>> now
>> >>> > two aspects of quoting to configure: the quote character and the
>> quote
>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>> currently
>> >>> > not implemented.
>> >>> >
>> >>> > First, I changed (without consulting this list, and please accept my
>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology
>> of
>> >>> > "encapsulator" to "quote char", and added "quote policy":
>> >>> >
>> >>> > - withQuoteChar(char)
>> >>> > - withQuotePolicy(Quote)
>> >>> >
>> >>> > My intention here is that all Quote APIs start with "withQuote"
>> >>> > followed by what aspect of quoting is being configured.
>> >>> >
>> >>> > Alternatively, we could have:
>> >>> >
>> >>> > - withQuote(char)
>> >>> > - withQuotePolicy(Quote)
>> >>>
>> >>> or
>> >>>
>> >>> - withQuote(char)
>> >>> - withQuote(Quote)
>> >>>
>> >>> ;-)
>> >>>
>> >>
>> >> Darn, I wish I knew you better to know if you were joking! :)
>> >>
>> >> This would not be good IMO because you are configuring two different
>> >> aspects of the behavior. When I see the same API name with different
>> >> parameters, I think that they are the same and that the API just does
>> >> conversions.
>> >>
>> >> We could consider making Quote a class instead of an enum and have it
>> >> carry a char and an enum, such that one object defines all quoting
>> >> aspects. This might be too normalized a design for something so simple
>> >> though.
>> >
>> > Actually I did not had a closer look to the API. You're definitely
>> right to
>> > use different names for different aspects. It does not make sense to
>> > overload just for fun.
>> >
>> >>
>> >>
>> >>>
>> >>> > Which makes the API more consistent with the other char/Character
>> based
>> >>> > properties:
>> >>> >
>> >>> > - withEscape
>> >>> > - withDelimiter
>> >>> > - withLineSeparator
>> >>> > - withCommentStart
>> >>> >
>> >>> > none of the above are post-fixed with a "Char" in the name.
>> >>> >
>> >>> > As far as reading, for me, the "-r" names are OK because the they
>> are
>> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>> >>> about
>> >>> > "an escape" because that would be an act (think Alcatraz) as
>> opposed to
>> >>> > what we have here: a character used to /perform/ escapes.
>> >>> >
>> >>> > So I propose to change "escape" to "escape char" because "escaper"
>> is
>> >>> > not a word.
>> >>> >
>> >>> > The name "comment start" is not great also because it implies (to
>> me)
>> >>> that
>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>> char"
>> >>> > would be better.
>> >>>
>> >>> Who said it has to be a single char?
>> >>>
>> >>
>> >> The current implementation does. ;)
>> >>
>> >> Are comments even in any RFC?
>> >
>> > Not that I am aware of.
>> >
>> >>> .withEOLComment("//")
>> >>>
>> >>>
>> >>> Same applies to the line separator:
>> >>>
>> >>> .withLineSeparator("\n\r")
>> >>>
>> >>> > Circling back to "quote char" which I have the way it is now for the
>> >>> > same reason as for the "escape" property.
>> >>> >
>> >>> > In summary, using *Char names is better IMO.
>> >>>
>> >>> Only if it can be a single char only. If it can either be a single
>> char
>> >>> or a
>> >>> String, I normally tend to use overloaded methods:
>> >>>
>> >>> - withEOLComment(char)
>> >>> - withEOLComment(CharSequence)
>> >>>
>> >>
>> >> If you want to add // to the mix, please start a different thread. I'm
>> not
>> >> sure this is really needed. Do you have a real life use case?
>> >
>> > People come up with all kind of "solutions" they are used to. CSV is
>> brittle
>> > anyway, just because there is no "real" standard.
>> >
>> > Cheers,
>> > Jörg
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> E-Mail: [hidden email] | [hidden email]
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

sebb-2-2
On 16 October 2012 16:34, Gary Gregory <[hidden email]> wrote:

> On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <[hidden email]>wrote:
>
>> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <[hidden email]>wrote:
>>
>>> Random thoughts--no real context here, so no way to inline:
>>>
>>> - "line separator" concept, while harmonizing with the line.separator
>>> system property, might be better represented as "row separator" so as
>>> not to imply that the parameter should be in any way limited to \r or
>>> \n .  I would think the default for this would be the line.separator
>>> property, however, and thus should take a String or CharSequence
>>> (perhaps it already does, but there's been so much talk about char
>>> parameters...).
>>>
>>
>> Now that you mention it, this should have been obvious as soon as we wrote
>> the test cases where a record is split over more than one line.
>>
>> There is a difference between line number and record number which the API
>> tracks.
>>
>> I propose to change "line separator" to "record separator". The default
>> can be line.separator.
>>
>> Gary
>>
>>
>>>
>>> - with* methods:  just something to think about here, but while we're
>>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>>> #withEscape('\\') ?
>>>
>>
> I find that the combination of the fluent API style AND immutability of the
> format class ugly because of the PRISTINE & DISABLED internal crud.
>
> Why not just have DEFAULT and dump PRISTINE? Other formats should be based
> on DEFAULT.

No, because DEFAULT includes several settings that may not be required.

> With PRISTINE, the door is open for a future format to not override
> DISABLED and create a bug, as unlikely as it is.

With DEFAULT, the door is *already* open for bugs due to failure to
reset the unwanted settings.

It's not possible currently to create an instance without overriding
the DISABLED delimiter.

> Gary
>
>
>
>
>>> $0.02,
>>> Matt
>>>
>>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>>> <[hidden email]> wrote:
>>> > Gary Gregory wrote:
>>> >
>>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>>> >> <[hidden email]>wrote:
>>> >>
>>> >>> Hi Gary,
>>> >>>
>>> >>> Gary Gregory wrote:
>>> >>>
>>> >>> > Hi All:
>>> >>> >
>>> >>> > The format object can configure various aspects of input and output
>>> >>> > formatting.
>>> >>> >
>>> >>> > With my recent addition of the Quote enum for [CSV-53], there are
>>> now
>>> >>> > two aspects of quoting to configure: the quote character and the
>>> quote
>>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>>> currently
>>> >>> > not implemented.
>>> >>> >
>>> >>> > First, I changed (without consulting this list, and please accept my
>>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology
>>> of
>>> >>> > "encapsulator" to "quote char", and added "quote policy":
>>> >>> >
>>> >>> > - withQuoteChar(char)
>>> >>> > - withQuotePolicy(Quote)
>>> >>> >
>>> >>> > My intention here is that all Quote APIs start with "withQuote"
>>> >>> > followed by what aspect of quoting is being configured.
>>> >>> >
>>> >>> > Alternatively, we could have:
>>> >>> >
>>> >>> > - withQuote(char)
>>> >>> > - withQuotePolicy(Quote)
>>> >>>
>>> >>> or
>>> >>>
>>> >>> - withQuote(char)
>>> >>> - withQuote(Quote)
>>> >>>
>>> >>> ;-)
>>> >>>
>>> >>
>>> >> Darn, I wish I knew you better to know if you were joking! :)
>>> >>
>>> >> This would not be good IMO because you are configuring two different
>>> >> aspects of the behavior. When I see the same API name with different
>>> >> parameters, I think that they are the same and that the API just does
>>> >> conversions.
>>> >>
>>> >> We could consider making Quote a class instead of an enum and have it
>>> >> carry a char and an enum, such that one object defines all quoting
>>> >> aspects. This might be too normalized a design for something so simple
>>> >> though.
>>> >
>>> > Actually I did not had a closer look to the API. You're definitely
>>> right to
>>> > use different names for different aspects. It does not make sense to
>>> > overload just for fun.
>>> >
>>> >>
>>> >>
>>> >>>
>>> >>> > Which makes the API more consistent with the other char/Character
>>> based
>>> >>> > properties:
>>> >>> >
>>> >>> > - withEscape
>>> >>> > - withDelimiter
>>> >>> > - withLineSeparator
>>> >>> > - withCommentStart
>>> >>> >
>>> >>> > none of the above are post-fixed with a "Char" in the name.
>>> >>> >
>>> >>> > As far as reading, for me, the "-r" names are OK because the they
>>> are
>>> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>>> >>> about
>>> >>> > "an escape" because that would be an act (think Alcatraz) as
>>> opposed to
>>> >>> > what we have here: a character used to /perform/ escapes.
>>> >>> >
>>> >>> > So I propose to change "escape" to "escape char" because "escaper"
>>> is
>>> >>> > not a word.
>>> >>> >
>>> >>> > The name "comment start" is not great also because it implies (to
>>> me)
>>> >>> that
>>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>>> char"
>>> >>> > would be better.
>>> >>>
>>> >>> Who said it has to be a single char?
>>> >>>
>>> >>
>>> >> The current implementation does. ;)
>>> >>
>>> >> Are comments even in any RFC?
>>> >
>>> > Not that I am aware of.
>>> >
>>> >>> .withEOLComment("//")
>>> >>>
>>> >>>
>>> >>> Same applies to the line separator:
>>> >>>
>>> >>> .withLineSeparator("\n\r")
>>> >>>
>>> >>> > Circling back to "quote char" which I have the way it is now for the
>>> >>> > same reason as for the "escape" property.
>>> >>> >
>>> >>> > In summary, using *Char names is better IMO.
>>> >>>
>>> >>> Only if it can be a single char only. If it can either be a single
>>> char
>>> >>> or a
>>> >>> String, I normally tend to use overloaded methods:
>>> >>>
>>> >>> - withEOLComment(char)
>>> >>> - withEOLComment(CharSequence)
>>> >>>
>>> >>
>>> >> If you want to add // to the mix, please start a different thread. I'm
>>> not
>>> >> sure this is really needed. Do you have a real life use case?
>>> >
>>> > People come up with all kind of "solutions" they are used to. CSV is
>>> brittle
>>> > anyway, just because there is no "real" standard.
>>> >
>>> > Cheers,
>>> > Jörg
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: [hidden email]
>>> > For additional commands, e-mail: [hidden email]
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>> --
>> E-Mail: [hidden email] | [hidden email]
>> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
>> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> E-Mail: [hidden email] | [hidden email]
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Jörg Schaible
In reply to this post by Matt Benson-2
Matt Benson wrote:

> Random thoughts--no real context here, so no way to inline:
>
> - "line separator" concept, while harmonizing with the line.separator
> system property, might be better represented as "row separator" so as
> not to imply that the parameter should be in any way limited to \r or
> \n .  I would think the default for this would be the line.separator
> property, however, and thus should take a String or CharSequence
> (perhaps it already does, but there's been so much talk about char
> parameters...).
>
> - with* methods:  just something to think about here, but while we're
> creating a fluent API, would e.g. #delimitedBy('\t') read more
> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
> #withEscape('\\') ?

+1, good idea!

- Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

sebb-2-2
On 16 October 2012 17:08, Jörg Schaible <[hidden email]> wrote:

> Matt Benson wrote:
>
>> Random thoughts--no real context here, so no way to inline:
>>
>> - "line separator" concept, while harmonizing with the line.separator
>> system property, might be better represented as "row separator" so as
>> not to imply that the parameter should be in any way limited to \r or
>> \n .  I would think the default for this would be the line.separator
>> property, however, and thus should take a String or CharSequence
>> (perhaps it already does, but there's been so much talk about char
>> parameters...).
>>
>> - with* methods:  just something to think about here, but while we're
>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>> #withEscape('\\') ?
>
> +1, good idea!

Not sure I agree.
The advantage of a common prefix is that they work well with IDEs.

Also I think it's confusing to have xxxBy and yyyWith.

> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Matt Benson-2
On Tue, Oct 16, 2012 at 11:27 AM, sebb <[hidden email]> wrote:

> On 16 October 2012 17:08, Jörg Schaible <[hidden email]> wrote:
>> Matt Benson wrote:
>>
>>> Random thoughts--no real context here, so no way to inline:
>>>
>>> - "line separator" concept, while harmonizing with the line.separator
>>> system property, might be better represented as "row separator" so as
>>> not to imply that the parameter should be in any way limited to \r or
>>> \n .  I would think the default for this would be the line.separator
>>> property, however, and thus should take a String or CharSequence
>>> (perhaps it already does, but there's been so much talk about char
>>> parameters...).
>>>
>>> - with* methods:  just something to think about here, but while we're
>>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>>> #withEscape('\\') ?
>>
>> +1, good idea!
>
> Not sure I agree.
> The advantage of a common prefix is that they work well with IDEs.

I can appreciate that if you began to type "with"... the IDE could
show ten different things you could be trying to use, but I don't know
that I'd go so far as to call this "working well with" the IDE.

>
> Also I think it's confusing to have xxxBy and yyyWith.

Are these specific examples not the words you would actually use were
you having a discussion on the subject in English?  :P

Matt

>
>> - Jörg
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

James Carman
On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <[hidden email]> wrote:
>
> Are these specific examples not the words you would actually use were
> you having a discussion on the subject in English?  :P
>

Why not just support both?  The "with*" methods would just be aliases
for the more "natural language" method names.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Matt Benson-2
On Tue, Oct 16, 2012 at 11:42 AM, James Carman
<[hidden email]> wrote:
> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <[hidden email]> wrote:
>>
>> Are these specific examples not the words you would actually use were
>> you having a discussion on the subject in English?  :P
>>
>
> Why not just support both?  The "with*" methods would just be aliases
> for the more "natural language" method names.

Or vice versa, sounds reasonable.  :)

Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

jodastephen
On 16 October 2012 17:44, Matt Benson <[hidden email]> wrote:

> On Tue, Oct 16, 2012 at 11:42 AM, James Carman
> <[hidden email]> wrote:
>> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <[hidden email]> wrote:
>>>
>>> Are these specific examples not the words you would actually use were
>>> you having a discussion on the subject in English?  :P
>>>
>>
>> Why not just support both?  The "with*" methods would just be aliases
>> for the more "natural language" method names.

I would categorise first in two
- mutable builders producing immutable objects
- immutable objects

The former should generally have short methods without prefixes, the
latter is more complex.

For the latter, as a general rule, I use
withXxx()/plusXxx()/minusXxx() for items that affect the state and
past participle for other methods that manipulate the object in other
ways:

// affects state (year/month/day)
 date = date.withYear(2012)
 date = date.plusYears(6)
// aftect multiple pieces of state, so past participle
 period = period.multipliedBy(6)
 period = period.negated()

This is simply an extension of when you might use setXxx() on a bean,
and when you might use a named method.

Stephen

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

Benedikt Ritter-3
2012/10/16 Stephen Colebourne <[hidden email]>:

> On 16 October 2012 17:44, Matt Benson <[hidden email]> wrote:
>> On Tue, Oct 16, 2012 at 11:42 AM, James Carman
>> <[hidden email]> wrote:
>>> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <[hidden email]> wrote:
>>>>
>>>> Are these specific examples not the words you would actually use were
>>>> you having a discussion on the subject in English?  :P
>>>>
>>>
>>> Why not just support both?  The "with*" methods would just be aliases
>>> for the more "natural language" method names.
>
> I would categorise first in two
> - mutable builders producing immutable objects
> - immutable objects
>

Implementing a builder for CSVFormat was discussed a while ago [1]. I
think it's the best solution, because the validate method can then
made private and no code outside the format has to worry about whether
a format is valid or not (right now CSV code calls validate on newly
created CSVFormat instances to make sure they are valid.).
Anyway there were voices against a builder because it would complicate
the API, so we never implemented something like that...

Benedikt

[1] http://markmail.org/thread/mmeoymd3cpq5jxfr

> The former should generally have short methods without prefixes, the
> latter is more complex.
>
> For the latter, as a general rule, I use
> withXxx()/plusXxx()/minusXxx() for items that affect the state and
> past participle for other methods that manipulate the object in other
> ways:
>
> // affects state (year/month/day)
>  date = date.withYear(2012)
>  date = date.plusYears(6)
> // aftect multiple pieces of state, so past participle
>  period = period.multipliedBy(6)
>  period = period.negated()
>
> This is simply an extension of when you might use setXxx() on a bean,
> and when you might use a named method.
>
> Stephen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

garydgregory
In reply to this post by sebb-2-2
On Tue, Oct 16, 2012 at 11:43 AM, sebb <[hidden email]> wrote:

> On 16 October 2012 16:34, Gary Gregory <[hidden email]> wrote:
> > On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <[hidden email]
> >wrote:
> >
> >> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <[hidden email]
> >wrote:
> >>
> >>> Random thoughts--no real context here, so no way to inline:
> >>>
> >>> - "line separator" concept, while harmonizing with the line.separator
> >>> system property, might be better represented as "row separator" so as
> >>> not to imply that the parameter should be in any way limited to \r or
> >>> \n .  I would think the default for this would be the line.separator
> >>> property, however, and thus should take a String or CharSequence
> >>> (perhaps it already does, but there's been so much talk about char
> >>> parameters...).
> >>>
> >>
> >> Now that you mention it, this should have been obvious as soon as we
> wrote
> >> the test cases where a record is split over more than one line.
> >>
> >> There is a difference between line number and record number which the
> API
> >> tracks.
> >>
> >> I propose to change "line separator" to "record separator".


Folks seem to like this one, it is now in SVN.


> The default
> >> can be line.separator.
>

I did not do this one as is it seems RFC4180 defines CR+LF as the record
separator as noted in the Javadoc for
org.apache.commons.csv.CSVFormat.DEFAULT.

Gary


> >>
> >> Gary
> >>
> >>
> >>>
> >>> - with* methods:  just something to think about here, but while we're
> >>> creating a fluent API, would e.g. #delimitedBy('\t') read more
> >>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
> >>> #withEscape('\\') ?
> >>>
> >>
> > I find that the combination of the fluent API style AND immutability of
> the
> > format class ugly because of the PRISTINE & DISABLED internal crud.
> >
> > Why not just have DEFAULT and dump PRISTINE? Other formats should be
> based
> > on DEFAULT.
>
> No, because DEFAULT includes several settings that may not be required.
>
> > With PRISTINE, the door is open for a future format to not override
> > DISABLED and create a bug, as unlikely as it is.
>
> With DEFAULT, the door is *already* open for bugs due to failure to
> reset the unwanted settings.
>
> It's not possible currently to create an instance without overriding
> the DISABLED delimiter.
>
> > Gary
> >
> >
> >
> >
> >>> $0.02,
> >>> Matt
> >>>
> >>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
> >>> <[hidden email]> wrote:
> >>> > Gary Gregory wrote:
> >>> >
> >>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
> >>> >> <[hidden email]>wrote:
> >>> >>
> >>> >>> Hi Gary,
> >>> >>>
> >>> >>> Gary Gregory wrote:
> >>> >>>
> >>> >>> > Hi All:
> >>> >>> >
> >>> >>> > The format object can configure various aspects of input and
> output
> >>> >>> > formatting.
> >>> >>> >
> >>> >>> > With my recent addition of the Quote enum for [CSV-53], there are
> >>> now
> >>> >>> > two aspects of quoting to configure: the quote character and the
> >>> quote
> >>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
> >>> currently
> >>> >>> > not implemented.
> >>> >>> >
> >>> >>> > First, I changed (without consulting this list, and please
> accept my
> >>> >>> > apologies for this) the - IMO - cryptic and burdensome
> terminology
> >>> of
> >>> >>> > "encapsulator" to "quote char", and added "quote policy":
> >>> >>> >
> >>> >>> > - withQuoteChar(char)
> >>> >>> > - withQuotePolicy(Quote)
> >>> >>> >
> >>> >>> > My intention here is that all Quote APIs start with "withQuote"
> >>> >>> > followed by what aspect of quoting is being configured.
> >>> >>> >
> >>> >>> > Alternatively, we could have:
> >>> >>> >
> >>> >>> > - withQuote(char)
> >>> >>> > - withQuotePolicy(Quote)
> >>> >>>
> >>> >>> or
> >>> >>>
> >>> >>> - withQuote(char)
> >>> >>> - withQuote(Quote)
> >>> >>>
> >>> >>> ;-)
> >>> >>>
> >>> >>
> >>> >> Darn, I wish I knew you better to know if you were joking! :)
> >>> >>
> >>> >> This would not be good IMO because you are configuring two different
> >>> >> aspects of the behavior. When I see the same API name with different
> >>> >> parameters, I think that they are the same and that the API just
> does
> >>> >> conversions.
> >>> >>
> >>> >> We could consider making Quote a class instead of an enum and have
> it
> >>> >> carry a char and an enum, such that one object defines all quoting
> >>> >> aspects. This might be too normalized a design for something so
> simple
> >>> >> though.
> >>> >
> >>> > Actually I did not had a closer look to the API. You're definitely
> >>> right to
> >>> > use different names for different aspects. It does not make sense to
> >>> > overload just for fun.
> >>> >
> >>> >>
> >>> >>
> >>> >>>
> >>> >>> > Which makes the API more consistent with the other char/Character
> >>> based
> >>> >>> > properties:
> >>> >>> >
> >>> >>> > - withEscape
> >>> >>> > - withDelimiter
> >>> >>> > - withLineSeparator
> >>> >>> > - withCommentStart
> >>> >>> >
> >>> >>> > none of the above are post-fixed with a "Char" in the name.
> >>> >>> >
> >>> >>> > As far as reading, for me, the "-r" names are OK because the they
> >>> are
> >>> >>> > nouns (things): "a delimiter", "a line separator." But I do not
> talk
> >>> >>> about
> >>> >>> > "an escape" because that would be an act (think Alcatraz) as
> >>> opposed to
> >>> >>> > what we have here: a character used to /perform/ escapes.
> >>> >>> >
> >>> >>> > So I propose to change "escape" to "escape char" because
> "escaper"
> >>> is
> >>> >>> > not a word.
> >>> >>> >
> >>> >>> > The name "comment start" is not great also because it implies (to
> >>> me)
> >>> >>> that
> >>> >>> > there is a "comment end" missing. So plain "comment" or "comment
> >>> char"
> >>> >>> > would be better.
> >>> >>>
> >>> >>> Who said it has to be a single char?
> >>> >>>
> >>> >>
> >>> >> The current implementation does. ;)
> >>> >>
> >>> >> Are comments even in any RFC?
> >>> >
> >>> > Not that I am aware of.
> >>> >
> >>> >>> .withEOLComment("//")
> >>> >>>
> >>> >>>
> >>> >>> Same applies to the line separator:
> >>> >>>
> >>> >>> .withLineSeparator("\n\r")
> >>> >>>
> >>> >>> > Circling back to "quote char" which I have the way it is now for
> the
> >>> >>> > same reason as for the "escape" property.
> >>> >>> >
> >>> >>> > In summary, using *Char names is better IMO.
> >>> >>>
> >>> >>> Only if it can be a single char only. If it can either be a single
> >>> char
> >>> >>> or a
> >>> >>> String, I normally tend to use overloaded methods:
> >>> >>>
> >>> >>> - withEOLComment(char)
> >>> >>> - withEOLComment(CharSequence)
> >>> >>>
> >>> >>
> >>> >> If you want to add // to the mix, please start a different thread.
> I'm
> >>> not
> >>> >> sure this is really needed. Do you have a real life use case?
> >>> >
> >>> > People come up with all kind of "solutions" they are used to. CSV is
> >>> brittle
> >>> > anyway, just because there is no "real" standard.
> >>> >
> >>> > Cheers,
> >>> > Jörg
> >>> >
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: [hidden email]
> >>> > For additional commands, e-mail: [hidden email]
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>>
> >>
> >>
> >> --
> >> E-Mail: [hidden email] | [hidden email]
> >> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> >> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> >> Blog: http://garygregory.wordpress.com
> >> Home: http://garygregory.com/
> >> Tweet! http://twitter.com/GaryGregory
> >>
> >
> >
> >
> > --
> > E-Mail: [hidden email] | [hidden email]
> > JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> > Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [csv] CSVFormat API names

James Carman
On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <[hidden email]> wrote:
>
> I did not do this one as is it seems RFC4180 defines CR+LF as the record
> separator as noted in the Javadoc for
> org.apache.commons.csv.CSVFormat.DEFAULT.
>

That's where the name of this component gets confusing to me.  Since
it's called "CSV", it would make sense that we follow RFC 4180, which
defines the standard for comma-separated value files and thus the
default record separator would be CRLF.  However, we are allowing
users to define whatever format they want using properties of the
CSVFormat class (of course, if you use delimiter != ',', then it's not
really CSV).  So, what's the intent?  This is more of a
delimited-record format parser/writer component which supports CSV.
Thus, it is not really very well-named.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12