[CSV] Inconsistent record separator behavior

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[CSV] Inconsistent record separator behavior

Benedikt Ritter-4
Hi,

we have this strange handling of record separator / line endings in CSV:

Users can use what ever character sequence they like as a record separator.
I could for example use the ! character to mark the end of a record.
Then we have CSVPrinter.printComment(String). This inserts comments into a
CSV output. It detects CRLF and call println() on the CSVFormat, which in
turn uses the record separator to indicate a new record...

So now I'm thinking: Does it make sense to use anything else but LF or CRLF
as record separator? Maybe we should deprecate
CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
users can choose between LF and CRLF. This way we can make the behavior
between parsing and printing consistent.

Thoughts?
Benedikt
Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Bruno P. Kinoshita-3
Hi,


Will try to look at the code and give a better answer during the weekend. But risking a silly question, would it mean that users are not able to parse a CSV unless each CSV row is separated by LF or CRLF? I remember getting a CSV in a government website some time ago that was formatted in a very strange way, and if I remember well it was a small file, but without LF or CRLF. I think it was using | to separate the rows, and , for columns.


Quick search returned at least another person with similar issue https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator


Not sure if I understood the problem well, but in case it makes sense... my suggestion would be to perhaps confirm if we could change CSVPrinter.printComment to accept other characters for line ending?


Thanks!

Bruno


________________________________
From: Benedikt Ritter <[hidden email]>
To: Commons Developers List <[hidden email]>
Sent: Tuesday, 21 August 2018 7:13 PM
Subject: [CSV] Inconsistent record separator behavior



Hi,


we have this strange handling of record separator / line endings in CSV:


Users can use what ever character sequence they like as a record separator.

I could for example use the ! character to mark the end of a record.

Then we have CSVPrinter.printComment(String). This inserts comments into a

CSV output. It detects CRLF and call println() on the CSVFormat, which in

turn uses the record separator to indicate a new record...


So now I'm thinking: Does it make sense to use anything else but LF or CRLF

as record separator? Maybe we should deprecate

CSVFormat.recordSeparator(String) and introduce a LineEnding enum where

users can choose between LF and CRLF. This way we can make the behavior

between parsing and printing consistent.


Thoughts?

Benedikt

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Benedikt Ritter-4
Hi Bruno,

Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
<[hidden email]>:

> Hi,
>
>
> Will try to look at the code and give a better answer during the weekend.
> But risking a silly question, would it mean that users are not able to
> parse a CSV unless each CSV row is separated by LF or CRLF?


Yes.


> I remember getting a CSV in a government website some time ago that was
> formatted in a very strange way, and if I remember well it was a small
> file, but without LF or CRLF. I think it was using | to separate the rows,
> and , for columns.
>

I didn't know that there are formats that don't use a new line as line
separator.


>
>
> Quick search returned at least another person with similar issue
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>
>
> Not sure if I understood the problem well, but in case it makes sense...
> my suggestion would be to perhaps confirm if we could change
> CSVPrinter.printComment to accept other characters for line ending?
>

The inconsistency I'm seeing is, that we an the one hand accept any
character sequence as a record separator. Comments in a way a like special
records to me. But our implementation seems to put them on a new "line"
using the println() method. The println() method in turn uses the record
seperator to start a new record. So it's not necessarily a new line.
Nevertheless while processing a comment, we look out for CR and LF and then
we call println() again. Maybe I'm just not getting it, but it feels pretty
messed up :-)

Regards,
Benedikt


>
>
> Thanks!
>
> Bruno
>
>
> ________________________________
> From: Benedikt Ritter <[hidden email]>
> To: Commons Developers List <[hidden email]>
> Sent: Tuesday, 21 August 2018 7:13 PM
> Subject: [CSV] Inconsistent record separator behavior
>
>
>
> Hi,
>
>
> we have this strange handling of record separator / line endings in CSV:
>
>
> Users can use what ever character sequence they like as a record separator.
>
> I could for example use the ! character to mark the end of a record.
>
> Then we have CSVPrinter.printComment(String). This inserts comments into a
>
> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>
> turn uses the record separator to indicate a new record...
>
>
> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>
> as record separator? Maybe we should deprecate
>
> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>
> users can choose between LF and CRLF. This way we can make the behavior
>
> between parsing and printing consistent.
>
>
> Thoughts?
>
> Benedikt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Bruno P. Kinoshita-3

>Maybe I'm just not getting it, but it feels pretty messed up :-)


Mutual feeling, and +1 for consistency. From what I understood, users should be able to parse these crazy CVS's, but if they tried to re-create them, with comments, then they wouldn't be able to avoid the println/newline (so it wouldn't be parseable later with the same reader).


We probably need a ticket for it to aggregate the discussion and maybe a possible solution.

Cheers

________________________________
From: Benedikt Ritter <[hidden email]>
To: Commons Developers List <[hidden email]>; [hidden email]
Sent: Thursday, 23 August 2018 7:10 AM
Subject: Re: [CSV] Inconsistent record separator behavior



Hi Bruno,

Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
<[hidden email]>:

> Hi,
>
>
> Will try to look at the code and give a better answer during the weekend.
> But risking a silly question, would it mean that users are not able to
> parse a CSV unless each CSV row is separated by LF or CRLF?


Yes.


> I remember getting a CSV in a government website some time ago that was
> formatted in a very strange way, and if I remember well it was a small
> file, but without LF or CRLF. I think it was using | to separate the rows,
> and , for columns.
>

I didn't know that there are formats that don't use a new line as line
separator.


>
>
> Quick search returned at least another person with similar issue
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>
>
> Not sure if I understood the problem well, but in case it makes sense...
> my suggestion would be to perhaps confirm if we could change
> CSVPrinter.printComment to accept other characters for line ending?
>

The inconsistency I'm seeing is, that we an the one hand accept any
character sequence as a record separator. Comments in a way a like special
records to me. But our implementation seems to put them on a new "line"
using the println() method. The println() method in turn uses the record
seperator to start a new record. So it's not necessarily a new line.
Nevertheless while processing a comment, we look out for CR and LF and then
we call println() again. Maybe I'm just not getting it, but it feels pretty
messed up :-)

Regards,
Benedikt



>
>
> Thanks!
>
> Bruno
>
>
> ________________________________
> From: Benedikt Ritter <[hidden email]>
> To: Commons Developers List <[hidden email]>
> Sent: Tuesday, 21 August 2018 7:13 PM
> Subject: [CSV] Inconsistent record separator behavior
>
>
>
> Hi,
>
>
> we have this strange handling of record separator / line endings in CSV:
>
>
> Users can use what ever character sequence they like as a record separator.
>
> I could for example use the ! character to mark the end of a record.
>
> Then we have CSVPrinter.printComment(String). This inserts comments into a
>
> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>
> turn uses the record separator to indicate a new record...
>
>
> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>
> as record separator? Maybe we should deprecate
>
> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>
> users can choose between LF and CRLF. This way we can make the behavior
>
> between parsing and printing consistent.
>
>
> Thoughts?
>
> Benedikt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

sebb-2-2
On 23 August 2018 at 00:01, Bruno P. Kinoshita
<[hidden email]> wrote:
>
>>Maybe I'm just not getting it, but it feels pretty messed up :-)
>
>
> Mutual feeling, and +1 for consistency. From what I understood, users should be able to parse these crazy CVS's, but if they tried to re-create them, with comments, then they wouldn't be able to avoid the println/newline (so it wouldn't be parseable later with the same reader).
>
>
> We probably need a ticket for it to aggregate the discussion and maybe a possible solution.

I'm wondering whether we need to be as flexible when *creating* the CSV files.

"Be liberal in what you accept, and conservative in what you send" (Jon Postel)

In this case send == create, as it might be sent to other less liberal readers.

I don't have a problem with the output being less flexible, so long as
it is sufficiently flexible (which I think it likely is already).

I don't think consistency is necessary - or even desirable - here.

> Cheers
>
> ________________________________
> From: Benedikt Ritter <[hidden email]>
> To: Commons Developers List <[hidden email]>; [hidden email]
> Sent: Thursday, 23 August 2018 7:10 AM
> Subject: Re: [CSV] Inconsistent record separator behavior
>
>
>
> Hi Bruno,
>
> Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> <[hidden email]>:
>
>> Hi,
>>
>>
>> Will try to look at the code and give a better answer during the weekend.
>> But risking a silly question, would it mean that users are not able to
>> parse a CSV unless each CSV row is separated by LF or CRLF?
>
>
> Yes.
>
>
>> I remember getting a CSV in a government website some time ago that was
>> formatted in a very strange way, and if I remember well it was a small
>> file, but without LF or CRLF. I think it was using | to separate the rows,
>> and , for columns.
>>
>
> I didn't know that there are formats that don't use a new line as line
> separator.
>
>
>>
>>
>> Quick search returned at least another person with similar issue
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>>
>>
>> Not sure if I understood the problem well, but in case it makes sense...
>> my suggestion would be to perhaps confirm if we could change
>> CSVPrinter.printComment to accept other characters for line ending?
>>
>
> The inconsistency I'm seeing is, that we an the one hand accept any
> character sequence as a record separator. Comments in a way a like special
> records to me. But our implementation seems to put them on a new "line"
> using the println() method. The println() method in turn uses the record
> seperator to start a new record. So it's not necessarily a new line.
> Nevertheless while processing a comment, we look out for CR and LF and then
> we call println() again. Maybe I'm just not getting it, but it feels pretty
> messed up :-)
>
> Regards,
> Benedikt
>
>
>
>>
>>
>> Thanks!
>>
>> Bruno
>>
>>
>> ________________________________
>> From: Benedikt Ritter <[hidden email]>
>> To: Commons Developers List <[hidden email]>
>> Sent: Tuesday, 21 August 2018 7:13 PM
>> Subject: [CSV] Inconsistent record separator behavior
>>
>>
>>
>> Hi,
>>
>>
>> we have this strange handling of record separator / line endings in CSV:
>>
>>
>> Users can use what ever character sequence they like as a record separator.
>>
>> I could for example use the ! character to mark the end of a record.
>>
>> Then we have CSVPrinter.printComment(String). This inserts comments into a
>>
>> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>>
>> turn uses the record separator to indicate a new record...
>>
>>
>> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>>
>> as record separator? Maybe we should deprecate
>>
>> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>>
>> users can choose between LF and CRLF. This way we can make the behavior
>>
>> between parsing and printing consistent.
>>
>>
>> Thoughts?
>>
>> Benedikt
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Bruno P. Kinoshita-3
Very good arguments (as always) Sebb. I'd also be OK with leaving as is, until we have a user with a good reason for changing the send/create.


And thanks for including the author of the quote. Going through his Wikipedia page, lots of things to read later.

Bruno


________________________________
From: sebb <[hidden email]>
To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita <[hidden email]>
Sent: Thursday, 23 August 2018 11:23 AM
Subject: Re: [CSV] Inconsistent record separator behavior



On 23 August 2018 at 00:01, Bruno P. Kinoshita
<[hidden email]> wrote:
>
>>Maybe I'm just not getting it, but it feels pretty messed up :-)
>
>
> Mutual feeling, and +1 for consistency. From what I understood, users should be able to parse these crazy CVS's, but if they tried to re-create them, with comments, then they wouldn't be able to avoid the println/newline (so it wouldn't be parseable later with the same reader).
>
>
> We probably need a ticket for it to aggregate the discussion and maybe a possible solution.

I'm wondering whether we need to be as flexible when *creating* the CSV files.

"Be liberal in what you accept, and conservative in what you send" (Jon Postel)

In this case send == create, as it might be sent to other less liberal readers.

I don't have a problem with the output being less flexible, so long as
it is sufficiently flexible (which I think it likely is already).

I don't think consistency is necessary - or even desirable - here.

> Cheers
>
> ________________________________
> From: Benedikt Ritter <[hidden email]>
> To: Commons Developers List <[hidden email]>; [hidden email]
> Sent: Thursday, 23 August 2018 7:10 AM
> Subject: Re: [CSV] Inconsistent record separator behavior
>
>
>
> Hi Bruno,
>
> Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> <[hidden email]>:
>
>> Hi,
>>
>>
>> Will try to look at the code and give a better answer during the weekend.
>> But risking a silly question, would it mean that users are not able to
>> parse a CSV unless each CSV row is separated by LF or CRLF?
>
>
> Yes.
>
>
>> I remember getting a CSV in a government website some time ago that was
>> formatted in a very strange way, and if I remember well it was a small
>> file, but without LF or CRLF. I think it was using | to separate the rows,
>> and , for columns.
>>
>
> I didn't know that there are formats that don't use a new line as line
> separator.
>
>
>>
>>
>> Quick search returned at least another person with similar issue
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>>
>>
>> Not sure if I understood the problem well, but in case it makes sense...
>> my suggestion would be to perhaps confirm if we could change
>> CSVPrinter.printComment to accept other characters for line ending?
>>
>
> The inconsistency I'm seeing is, that we an the one hand accept any
> character sequence as a record separator. Comments in a way a like special
> records to me. But our implementation seems to put them on a new "line"
> using the println() method. The println() method in turn uses the record
> seperator to start a new record. So it's not necessarily a new line.
> Nevertheless while processing a comment, we look out for CR and LF and then
> we call println() again. Maybe I'm just not getting it, but it feels pretty
> messed up :-)
>
> Regards,
> Benedikt
>
>
>
>>
>>
>> Thanks!
>>
>> Bruno
>>
>>
>> ________________________________
>> From: Benedikt Ritter <[hidden email]>
>> To: Commons Developers List <[hidden email]>
>> Sent: Tuesday, 21 August 2018 7:13 PM
>> Subject: [CSV] Inconsistent record separator behavior
>>
>>
>>
>> Hi,
>>
>>
>> we have this strange handling of record separator / line endings in CSV:
>>
>>
>> Users can use what ever character sequence they like as a record separator.
>>
>> I could for example use the ! character to mark the end of a record.
>>
>> Then we have CSVPrinter.printComment(String). This inserts comments into a
>>
>> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>>
>> turn uses the record separator to indicate a new record...
>>
>>
>> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>>
>> as record separator? Maybe we should deprecate
>>
>> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>>
>> users can choose between LF and CRLF. This way we can make the behavior
>>
>> between parsing and printing consistent.
>>
>>
>> Thoughts?
>>
>> Benedikt
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Benedikt Ritter-4
In reply to this post by sebb-2-2
Hey sebb,

Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <[hidden email]>:

> On 23 August 2018 at 00:01, Bruno P. Kinoshita
> <[hidden email]> wrote:
> >
> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >
> >
> > Mutual feeling, and +1 for consistency. From what I understood, users
> should be able to parse these crazy CVS's, but if they tried to re-create
> them, with comments, then they wouldn't be able to avoid the
> println/newline (so it wouldn't be parseable later with the same reader).
> >
> >
> > We probably need a ticket for it to aggregate the discussion and maybe a
> possible solution.
>
> I'm wondering whether we need to be as flexible when *creating* the CSV
> files.
>
> "Be liberal in what you accept, and conservative in what you send" (Jon
> Postel)
>
> In this case send == create, as it might be sent to other less liberal
> readers.
>
> I don't have a problem with the output being less flexible, so long as
> it is sufficiently flexible (which I think it likely is already).
>
> I don't think consistency is necessary - or even desirable - here.
>

okay, but wouldn't you expect that you can use a CSVFormat instance to read
a file that you created with it? This is currently not the case.

Regards,
Benedikt


>
> > Cheers
> >
> > ________________________________
> > From: Benedikt Ritter <[hidden email]>
> > To: Commons Developers List <[hidden email]>;
> [hidden email]
> > Sent: Thursday, 23 August 2018 7:10 AM
> > Subject: Re: [CSV] Inconsistent record separator behavior
> >
> >
> >
> > Hi Bruno,
> >
> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> > <[hidden email]>:
> >
> >> Hi,
> >>
> >>
> >> Will try to look at the code and give a better answer during the
> weekend.
> >> But risking a silly question, would it mean that users are not able to
> >> parse a CSV unless each CSV row is separated by LF or CRLF?
> >
> >
> > Yes.
> >
> >
> >> I remember getting a CSV in a government website some time ago that was
> >> formatted in a very strange way, and if I remember well it was a small
> >> file, but without LF or CRLF. I think it was using | to separate the
> rows,
> >> and , for columns.
> >>
> >
> > I didn't know that there are formats that don't use a new line as line
> > separator.
> >
> >
> >>
> >>
> >> Quick search returned at least another person with similar issue
> >>
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
> >>
> >>
> >> Not sure if I understood the problem well, but in case it makes sense...
> >> my suggestion would be to perhaps confirm if we could change
> >> CSVPrinter.printComment to accept other characters for line ending?
> >>
> >
> > The inconsistency I'm seeing is, that we an the one hand accept any
> > character sequence as a record separator. Comments in a way a like
> special
> > records to me. But our implementation seems to put them on a new "line"
> > using the println() method. The println() method in turn uses the record
> > seperator to start a new record. So it's not necessarily a new line.
> > Nevertheless while processing a comment, we look out for CR and LF and
> then
> > we call println() again. Maybe I'm just not getting it, but it feels
> pretty
> > messed up :-)
> >
> > Regards,
> > Benedikt
> >
> >
> >
> >>
> >>
> >> Thanks!
> >>
> >> Bruno
> >>
> >>
> >> ________________________________
> >> From: Benedikt Ritter <[hidden email]>
> >> To: Commons Developers List <[hidden email]>
> >> Sent: Tuesday, 21 August 2018 7:13 PM
> >> Subject: [CSV] Inconsistent record separator behavior
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >> we have this strange handling of record separator / line endings in CSV:
> >>
> >>
> >> Users can use what ever character sequence they like as a record
> separator.
> >>
> >> I could for example use the ! character to mark the end of a record.
> >>
> >> Then we have CSVPrinter.printComment(String). This inserts comments
> into a
> >>
> >> CSV output. It detects CRLF and call println() on the CSVFormat, which
> in
> >>
> >> turn uses the record separator to indicate a new record...
> >>
> >>
> >> So now I'm thinking: Does it make sense to use anything else but LF or
> CRLF
> >>
> >> as record separator? Maybe we should deprecate
> >>
> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
> >>
> >> users can choose between LF and CRLF. This way we can make the behavior
> >>
> >> between parsing and printing consistent.
> >>
> >>
> >> Thoughts?
> >>
> >> Benedikt
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

sebb-2-2
On 23 August 2018 at 07:10, Benedikt Ritter <[hidden email]> wrote:

> Hey sebb,
>
> Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <[hidden email]>:
>
>> On 23 August 2018 at 00:01, Bruno P. Kinoshita
>> <[hidden email]> wrote:
>> >
>> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
>> >
>> >
>> > Mutual feeling, and +1 for consistency. From what I understood, users
>> should be able to parse these crazy CVS's, but if they tried to re-create
>> them, with comments, then they wouldn't be able to avoid the
>> println/newline (so it wouldn't be parseable later with the same reader).
>> >
>> >
>> > We probably need a ticket for it to aggregate the discussion and maybe a
>> possible solution.
>>
>> I'm wondering whether we need to be as flexible when *creating* the CSV
>> files.
>>
>> "Be liberal in what you accept, and conservative in what you send" (Jon
>> Postel)
>>
>> In this case send == create, as it might be sent to other less liberal
>> readers.
>>
>> I don't have a problem with the output being less flexible, so long as
>> it is sufficiently flexible (which I think it likely is already).
>>
>> I don't think consistency is necessary - or even desirable - here.
>>
>
> okay, but wouldn't you expect that you can use a CSVFormat instance to read
> a file that you created with it? This is currently not the case.

Sorry, I misread the problem.

Yes, it should be able to read what it writes.

So the issue remains: should the reader be able to parse the unusual
format, or should the writer not be able to create it?

I don't have a particular view on that, except that allowing LF and
CRLF only seems too restricting.
We should allow at least CR alone. I don't know whether there are any
other reasonable separators.

Perhaps we could just document the method to warn that using anything
other than CR, LF or CRLF will produce an output file that is not
parseable?

> Regards,
> Benedikt
>
>
>>
>> > Cheers
>> >
>> > ________________________________
>> > From: Benedikt Ritter <[hidden email]>
>> > To: Commons Developers List <[hidden email]>;
>> [hidden email]
>> > Sent: Thursday, 23 August 2018 7:10 AM
>> > Subject: Re: [CSV] Inconsistent record separator behavior
>> >
>> >
>> >
>> > Hi Bruno,
>> >
>> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
>> > <[hidden email]>:
>> >
>> >> Hi,
>> >>
>> >>
>> >> Will try to look at the code and give a better answer during the
>> weekend.
>> >> But risking a silly question, would it mean that users are not able to
>> >> parse a CSV unless each CSV row is separated by LF or CRLF?
>> >
>> >
>> > Yes.
>> >
>> >
>> >> I remember getting a CSV in a government website some time ago that was
>> >> formatted in a very strange way, and if I remember well it was a small
>> >> file, but without LF or CRLF. I think it was using | to separate the
>> rows,
>> >> and , for columns.
>> >>
>> >
>> > I didn't know that there are formats that don't use a new line as line
>> > separator.
>> >
>> >
>> >>
>> >>
>> >> Quick search returned at least another person with similar issue
>> >>
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>> >>
>> >>
>> >> Not sure if I understood the problem well, but in case it makes sense...
>> >> my suggestion would be to perhaps confirm if we could change
>> >> CSVPrinter.printComment to accept other characters for line ending?
>> >>
>> >
>> > The inconsistency I'm seeing is, that we an the one hand accept any
>> > character sequence as a record separator. Comments in a way a like
>> special
>> > records to me. But our implementation seems to put them on a new "line"
>> > using the println() method. The println() method in turn uses the record
>> > seperator to start a new record. So it's not necessarily a new line.
>> > Nevertheless while processing a comment, we look out for CR and LF and
>> then
>> > we call println() again. Maybe I'm just not getting it, but it feels
>> pretty
>> > messed up :-)
>> >
>> > Regards,
>> > Benedikt
>> >
>> >
>> >
>> >>
>> >>
>> >> Thanks!
>> >>
>> >> Bruno
>> >>
>> >>
>> >> ________________________________
>> >> From: Benedikt Ritter <[hidden email]>
>> >> To: Commons Developers List <[hidden email]>
>> >> Sent: Tuesday, 21 August 2018 7:13 PM
>> >> Subject: [CSV] Inconsistent record separator behavior
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >>
>> >> we have this strange handling of record separator / line endings in CSV:
>> >>
>> >>
>> >> Users can use what ever character sequence they like as a record
>> separator.
>> >>
>> >> I could for example use the ! character to mark the end of a record.
>> >>
>> >> Then we have CSVPrinter.printComment(String). This inserts comments
>> into a
>> >>
>> >> CSV output. It detects CRLF and call println() on the CSVFormat, which
>> in
>> >>
>> >> turn uses the record separator to indicate a new record...
>> >>
>> >>
>> >> So now I'm thinking: Does it make sense to use anything else but LF or
>> CRLF
>> >>
>> >> as record separator? Maybe we should deprecate
>> >>
>> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>> >>
>> >> users can choose between LF and CRLF. This way we can make the behavior
>> >>
>> >> between parsing and printing consistent.
>> >>
>> >>
>> >> Thoughts?
>> >>
>> >> Benedikt
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Benedikt Ritter-4
Hi,

Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb <[hidden email]>:

> On 23 August 2018 at 07:10, Benedikt Ritter <[hidden email]> wrote:
> > Hey sebb,
> >
> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <[hidden email]>:
> >
> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita
> >> <[hidden email]> wrote:
> >> >
> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >> >
> >> >
> >> > Mutual feeling, and +1 for consistency. From what I understood, users
> >> should be able to parse these crazy CVS's, but if they tried to
> re-create
> >> them, with comments, then they wouldn't be able to avoid the
> >> println/newline (so it wouldn't be parseable later with the same
> reader).
> >> >
> >> >
> >> > We probably need a ticket for it to aggregate the discussion and
> maybe a
> >> possible solution.
> >>
> >> I'm wondering whether we need to be as flexible when *creating* the CSV
> >> files.
> >>
> >> "Be liberal in what you accept, and conservative in what you send" (Jon
> >> Postel)
> >>
> >> In this case send == create, as it might be sent to other less liberal
> >> readers.
> >>
> >> I don't have a problem with the output being less flexible, so long as
> >> it is sufficiently flexible (which I think it likely is already).
> >>
> >> I don't think consistency is necessary - or even desirable - here.
> >>
> >
> > okay, but wouldn't you expect that you can use a CSVFormat instance to
> read
> > a file that you created with it? This is currently not the case.
>
> Sorry, I misread the problem.
>
> Yes, it should be able to read what it writes.
>
> So the issue remains: should the reader be able to parse the unusual
> format, or should the writer not be able to create it?
>
> I don't have a particular view on that, except that allowing LF and
> CRLF only seems too restricting.
> We should allow at least CR alone. I don't know whether there are any
> other reasonable separators.
>

As Bruno pointed out, there seem to be formats that have record separator
that are not new lines. So maybe CSVPrinter.printComment(String) should not
scan for CR and LF but for the record separator.


>
> Perhaps we could just document the method to warn that using anything
> other than CR, LF or CRLF will produce an output file that is not
> parseable?
>

That sounds like a good approach. But how would you implement that? You
probably don't want to introduce a dependency on a logging framework just
for that, do you?

Regards,
Benedikt


>
> > Regards,
> > Benedikt
> >
> >
> >>
> >> > Cheers
> >> >
> >> > ________________________________
> >> > From: Benedikt Ritter <[hidden email]>
> >> > To: Commons Developers List <[hidden email]>;
> >> [hidden email]
> >> > Sent: Thursday, 23 August 2018 7:10 AM
> >> > Subject: Re: [CSV] Inconsistent record separator behavior
> >> >
> >> >
> >> >
> >> > Hi Bruno,
> >> >
> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> >> > <[hidden email]>:
> >> >
> >> >> Hi,
> >> >>
> >> >>
> >> >> Will try to look at the code and give a better answer during the
> >> weekend.
> >> >> But risking a silly question, would it mean that users are not able
> to
> >> >> parse a CSV unless each CSV row is separated by LF or CRLF?
> >> >
> >> >
> >> > Yes.
> >> >
> >> >
> >> >> I remember getting a CSV in a government website some time ago that
> was
> >> >> formatted in a very strange way, and if I remember well it was a
> small
> >> >> file, but without LF or CRLF. I think it was using | to separate the
> >> rows,
> >> >> and , for columns.
> >> >>
> >> >
> >> > I didn't know that there are formats that don't use a new line as line
> >> > separator.
> >> >
> >> >
> >> >>
> >> >>
> >> >> Quick search returned at least another person with similar issue
> >> >>
> >>
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
> >> >>
> >> >>
> >> >> Not sure if I understood the problem well, but in case it makes
> sense...
> >> >> my suggestion would be to perhaps confirm if we could change
> >> >> CSVPrinter.printComment to accept other characters for line ending?
> >> >>
> >> >
> >> > The inconsistency I'm seeing is, that we an the one hand accept any
> >> > character sequence as a record separator. Comments in a way a like
> >> special
> >> > records to me. But our implementation seems to put them on a new
> "line"
> >> > using the println() method. The println() method in turn uses the
> record
> >> > seperator to start a new record. So it's not necessarily a new line.
> >> > Nevertheless while processing a comment, we look out for CR and LF and
> >> then
> >> > we call println() again. Maybe I'm just not getting it, but it feels
> >> pretty
> >> > messed up :-)
> >> >
> >> > Regards,
> >> > Benedikt
> >> >
> >> >
> >> >
> >> >>
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Bruno
> >> >>
> >> >>
> >> >> ________________________________
> >> >> From: Benedikt Ritter <[hidden email]>
> >> >> To: Commons Developers List <[hidden email]>
> >> >> Sent: Tuesday, 21 August 2018 7:13 PM
> >> >> Subject: [CSV] Inconsistent record separator behavior
> >> >>
> >> >>
> >> >>
> >> >> Hi,
> >> >>
> >> >>
> >> >> we have this strange handling of record separator / line endings in
> CSV:
> >> >>
> >> >>
> >> >> Users can use what ever character sequence they like as a record
> >> separator.
> >> >>
> >> >> I could for example use the ! character to mark the end of a record.
> >> >>
> >> >> Then we have CSVPrinter.printComment(String). This inserts comments
> >> into a
> >> >>
> >> >> CSV output. It detects CRLF and call println() on the CSVFormat,
> which
> >> in
> >> >>
> >> >> turn uses the record separator to indicate a new record...
> >> >>
> >> >>
> >> >> So now I'm thinking: Does it make sense to use anything else but LF
> or
> >> CRLF
> >> >>
> >> >> as record separator? Maybe we should deprecate
> >> >>
> >> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum
> where
> >> >>
> >> >> users can choose between LF and CRLF. This way we can make the
> behavior
> >> >>
> >> >> between parsing and printing consistent.
> >> >>
> >> >>
> >> >> Thoughts?
> >> >>
> >> >> Benedikt
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [hidden email]
> >> >> For additional commands, e-mail: [hidden email]
> >> >
> >> >>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [hidden email]
> >> > For additional commands, e-mail: [hidden email]
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

sebb-2-2
On 23 August 2018 at 17:31, Benedikt Ritter <[hidden email]> wrote:

> Hi,
>
> Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb <[hidden email]>:
>
>> On 23 August 2018 at 07:10, Benedikt Ritter <[hidden email]> wrote:
>> > Hey sebb,
>> >
>> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <[hidden email]>:
>> >
>> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita
>> >> <[hidden email]> wrote:
>> >> >
>> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
>> >> >
>> >> >
>> >> > Mutual feeling, and +1 for consistency. From what I understood, users
>> >> should be able to parse these crazy CVS's, but if they tried to
>> re-create
>> >> them, with comments, then they wouldn't be able to avoid the
>> >> println/newline (so it wouldn't be parseable later with the same
>> reader).
>> >> >
>> >> >
>> >> > We probably need a ticket for it to aggregate the discussion and
>> maybe a
>> >> possible solution.
>> >>
>> >> I'm wondering whether we need to be as flexible when *creating* the CSV
>> >> files.
>> >>
>> >> "Be liberal in what you accept, and conservative in what you send" (Jon
>> >> Postel)
>> >>
>> >> In this case send == create, as it might be sent to other less liberal
>> >> readers.
>> >>
>> >> I don't have a problem with the output being less flexible, so long as
>> >> it is sufficiently flexible (which I think it likely is already).
>> >>
>> >> I don't think consistency is necessary - or even desirable - here.
>> >>
>> >
>> > okay, but wouldn't you expect that you can use a CSVFormat instance to
>> read
>> > a file that you created with it? This is currently not the case.
>>
>> Sorry, I misread the problem.
>>
>> Yes, it should be able to read what it writes.
>>
>> So the issue remains: should the reader be able to parse the unusual
>> format, or should the writer not be able to create it?
>>
>> I don't have a particular view on that, except that allowing LF and
>> CRLF only seems too restricting.
>> We should allow at least CR alone. I don't know whether there are any
>> other reasonable separators.
>>
>
> As Bruno pointed out, there seem to be formats that have record separator
> that are not new lines. So maybe CSVPrinter.printComment(String) should not
> scan for CR and LF but for the record separator.
>

Makes sense.

>>
>> Perhaps we could just document the method to warn that using anything
>> other than CR, LF or CRLF will produce an output file that is not
>> parseable?
>>
>
> That sounds like a good approach. But how would you implement that? You
> probably don't want to introduce a dependency on a logging framework just
> for that, do you?

I meant: add a warning to the documentation.

> Regards,
> Benedikt
>
>
>>
>> > Regards,
>> > Benedikt
>> >
>> >
>> >>
>> >> > Cheers
>> >> >
>> >> > ________________________________
>> >> > From: Benedikt Ritter <[hidden email]>
>> >> > To: Commons Developers List <[hidden email]>;
>> >> [hidden email]
>> >> > Sent: Thursday, 23 August 2018 7:10 AM
>> >> > Subject: Re: [CSV] Inconsistent record separator behavior
>> >> >
>> >> >
>> >> >
>> >> > Hi Bruno,
>> >> >
>> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
>> >> > <[hidden email]>:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >> Will try to look at the code and give a better answer during the
>> >> weekend.
>> >> >> But risking a silly question, would it mean that users are not able
>> to
>> >> >> parse a CSV unless each CSV row is separated by LF or CRLF?
>> >> >
>> >> >
>> >> > Yes.
>> >> >
>> >> >
>> >> >> I remember getting a CSV in a government website some time ago that
>> was
>> >> >> formatted in a very strange way, and if I remember well it was a
>> small
>> >> >> file, but without LF or CRLF. I think it was using | to separate the
>> >> rows,
>> >> >> and , for columns.
>> >> >>
>> >> >
>> >> > I didn't know that there are formats that don't use a new line as line
>> >> > separator.
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> Quick search returned at least another person with similar issue
>> >> >>
>> >>
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>> >> >>
>> >> >>
>> >> >> Not sure if I understood the problem well, but in case it makes
>> sense...
>> >> >> my suggestion would be to perhaps confirm if we could change
>> >> >> CSVPrinter.printComment to accept other characters for line ending?
>> >> >>
>> >> >
>> >> > The inconsistency I'm seeing is, that we an the one hand accept any
>> >> > character sequence as a record separator. Comments in a way a like
>> >> special
>> >> > records to me. But our implementation seems to put them on a new
>> "line"
>> >> > using the println() method. The println() method in turn uses the
>> record
>> >> > seperator to start a new record. So it's not necessarily a new line.
>> >> > Nevertheless while processing a comment, we look out for CR and LF and
>> >> then
>> >> > we call println() again. Maybe I'm just not getting it, but it feels
>> >> pretty
>> >> > messed up :-)
>> >> >
>> >> > Regards,
>> >> > Benedikt
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> Bruno
>> >> >>
>> >> >>
>> >> >> ________________________________
>> >> >> From: Benedikt Ritter <[hidden email]>
>> >> >> To: Commons Developers List <[hidden email]>
>> >> >> Sent: Tuesday, 21 August 2018 7:13 PM
>> >> >> Subject: [CSV] Inconsistent record separator behavior
>> >> >>
>> >> >>
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >> we have this strange handling of record separator / line endings in
>> CSV:
>> >> >>
>> >> >>
>> >> >> Users can use what ever character sequence they like as a record
>> >> separator.
>> >> >>
>> >> >> I could for example use the ! character to mark the end of a record.
>> >> >>
>> >> >> Then we have CSVPrinter.printComment(String). This inserts comments
>> >> into a
>> >> >>
>> >> >> CSV output. It detects CRLF and call println() on the CSVFormat,
>> which
>> >> in
>> >> >>
>> >> >> turn uses the record separator to indicate a new record...
>> >> >>
>> >> >>
>> >> >> So now I'm thinking: Does it make sense to use anything else but LF
>> or
>> >> CRLF
>> >> >>
>> >> >> as record separator? Maybe we should deprecate
>> >> >>
>> >> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum
>> where
>> >> >>
>> >> >> users can choose between LF and CRLF. This way we can make the
>> behavior
>> >> >>
>> >> >> between parsing and printing consistent.
>> >> >>
>> >> >>
>> >> >> Thoughts?
>> >> >>
>> >> >> Benedikt
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [hidden email]
>> >> >> For additional commands, e-mail: [hidden email]
>> >> >
>> >> >>
>> >> >>
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: [hidden email]
>> >> > For additional commands, e-mail: [hidden email]
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [CSV] Inconsistent record separator behavior

Benedikt Ritter-4
Am Do., 23. Aug. 2018 um 20:17 Uhr schrieb sebb <[hidden email]>:

> On 23 August 2018 at 17:31, Benedikt Ritter <[hidden email]> wrote:
> > Hi,
> >
> > Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb <[hidden email]>:
> >
> >> On 23 August 2018 at 07:10, Benedikt Ritter <[hidden email]> wrote:
> >> > Hey sebb,
> >> >
> >> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <[hidden email]>:
> >> >
> >> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita
> >> >> <[hidden email]> wrote:
> >> >> >
> >> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >> >> >
> >> >> >
> >> >> > Mutual feeling, and +1 for consistency. From what I understood,
> users
> >> >> should be able to parse these crazy CVS's, but if they tried to
> >> re-create
> >> >> them, with comments, then they wouldn't be able to avoid the
> >> >> println/newline (so it wouldn't be parseable later with the same
> >> reader).
> >> >> >
> >> >> >
> >> >> > We probably need a ticket for it to aggregate the discussion and
> >> maybe a
> >> >> possible solution.
> >> >>
> >> >> I'm wondering whether we need to be as flexible when *creating* the
> CSV
> >> >> files.
> >> >>
> >> >> "Be liberal in what you accept, and conservative in what you send"
> (Jon
> >> >> Postel)
> >> >>
> >> >> In this case send == create, as it might be sent to other less
> liberal
> >> >> readers.
> >> >>
> >> >> I don't have a problem with the output being less flexible, so long
> as
> >> >> it is sufficiently flexible (which I think it likely is already).
> >> >>
> >> >> I don't think consistency is necessary - or even desirable - here.
> >> >>
> >> >
> >> > okay, but wouldn't you expect that you can use a CSVFormat instance to
> >> read
> >> > a file that you created with it? This is currently not the case.
> >>
> >> Sorry, I misread the problem.
> >>
> >> Yes, it should be able to read what it writes.
> >>
> >> So the issue remains: should the reader be able to parse the unusual
> >> format, or should the writer not be able to create it?
> >>
> >> I don't have a particular view on that, except that allowing LF and
> >> CRLF only seems too restricting.
> >> We should allow at least CR alone. I don't know whether there are any
> >> other reasonable separators.
> >>
> >
> > As Bruno pointed out, there seem to be formats that have record separator
> > that are not new lines. So maybe CSVPrinter.printComment(String) should
> not
> > scan for CR and LF but for the record separator.
> >
>
> Makes sense.
>
> >>
> >> Perhaps we could just document the method to warn that using anything
> >> other than CR, LF or CRLF will produce an output file that is not
> >> parseable?
> >>
> >
> > That sounds like a good approach. But how would you implement that? You
> > probably don't want to introduce a dependency on a logging framework just
> > for that, do you?
>
> I meant: add a warning to the documentation.
>

+1 for that! CSVPrinter has almost no class level documentation, so I
wanted to improve that anyway.

Benedikt


>
> > Regards,
> > Benedikt
> >
> >
> >>
> >> > Regards,
> >> > Benedikt
> >> >
> >> >
> >> >>
> >> >> > Cheers
> >> >> >
> >> >> > ________________________________
> >> >> > From: Benedikt Ritter <[hidden email]>
> >> >> > To: Commons Developers List <[hidden email]>;
> >> >> [hidden email]
> >> >> > Sent: Thursday, 23 August 2018 7:10 AM
> >> >> > Subject: Re: [CSV] Inconsistent record separator behavior
> >> >> >
> >> >> >
> >> >> >
> >> >> > Hi Bruno,
> >> >> >
> >> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> >> >> > <[hidden email]>:
> >> >> >
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >> Will try to look at the code and give a better answer during the
> >> >> weekend.
> >> >> >> But risking a silly question, would it mean that users are not
> able
> >> to
> >> >> >> parse a CSV unless each CSV row is separated by LF or CRLF?
> >> >> >
> >> >> >
> >> >> > Yes.
> >> >> >
> >> >> >
> >> >> >> I remember getting a CSV in a government website some time ago
> that
> >> was
> >> >> >> formatted in a very strange way, and if I remember well it was a
> >> small
> >> >> >> file, but without LF or CRLF. I think it was using | to separate
> the
> >> >> rows,
> >> >> >> and , for columns.
> >> >> >>
> >> >> >
> >> >> > I didn't know that there are formats that don't use a new line as
> line
> >> >> > separator.
> >> >> >
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> Quick search returned at least another person with similar issue
> >> >> >>
> >> >>
> >>
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
> >> >> >>
> >> >> >>
> >> >> >> Not sure if I understood the problem well, but in case it makes
> >> sense...
> >> >> >> my suggestion would be to perhaps confirm if we could change
> >> >> >> CSVPrinter.printComment to accept other characters for line
> ending?
> >> >> >>
> >> >> >
> >> >> > The inconsistency I'm seeing is, that we an the one hand accept any
> >> >> > character sequence as a record separator. Comments in a way a like
> >> >> special
> >> >> > records to me. But our implementation seems to put them on a new
> >> "line"
> >> >> > using the println() method. The println() method in turn uses the
> >> record
> >> >> > seperator to start a new record. So it's not necessarily a new
> line.
> >> >> > Nevertheless while processing a comment, we look out for CR and LF
> and
> >> >> then
> >> >> > we call println() again. Maybe I'm just not getting it, but it
> feels
> >> >> pretty
> >> >> > messed up :-)
> >> >> >
> >> >> > Regards,
> >> >> > Benedikt
> >> >> >
> >> >> >
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> Thanks!
> >> >> >>
> >> >> >> Bruno
> >> >> >>
> >> >> >>
> >> >> >> ________________________________
> >> >> >> From: Benedikt Ritter <[hidden email]>
> >> >> >> To: Commons Developers List <[hidden email]>
> >> >> >> Sent: Tuesday, 21 August 2018 7:13 PM
> >> >> >> Subject: [CSV] Inconsistent record separator behavior
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >> we have this strange handling of record separator / line endings
> in
> >> CSV:
> >> >> >>
> >> >> >>
> >> >> >> Users can use what ever character sequence they like as a record
> >> >> separator.
> >> >> >>
> >> >> >> I could for example use the ! character to mark the end of a
> record.
> >> >> >>
> >> >> >> Then we have CSVPrinter.printComment(String). This inserts
> comments
> >> >> into a
> >> >> >>
> >> >> >> CSV output. It detects CRLF and call println() on the CSVFormat,
> >> which
> >> >> in
> >> >> >>
> >> >> >> turn uses the record separator to indicate a new record...
> >> >> >>
> >> >> >>
> >> >> >> So now I'm thinking: Does it make sense to use anything else but
> LF
> >> or
> >> >> CRLF
> >> >> >>
> >> >> >> as record separator? Maybe we should deprecate
> >> >> >>
> >> >> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum
> >> where
> >> >> >>
> >> >> >> users can choose between LF and CRLF. This way we can make the
> >> behavior
> >> >> >>
> >> >> >> between parsing and printing consistent.
> >> >> >>
> >> >> >>
> >> >> >> Thoughts?
> >> >> >>
> >> >> >> Benedikt
> >> >> >>
> >> >> >>
> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: [hidden email]
> >> >> >> For additional commands, e-mail: [hidden email]
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: [hidden email]
> >> >> > For additional commands, e-mail: [hidden email]
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [hidden email]
> >> >> For additional commands, e-mail: [hidden email]
> >> >>
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>