[jira] Created: (SANDBOX-263) Excel Strategy using wrong seperator

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SANDBOX-263) Excel Strategy using wrong seperator

Richard (Jira)
Excel Strategy using wrong seperator
------------------------------------

                 Key: SANDBOX-263
                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
             Project: Commons Sandbox
          Issue Type: Bug
          Components: CSV
            Reporter: Gunnar Wagenknecht


The Excel strategy is defined as follows.
{code}
    public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
                                                                 false, false, false);
{code}

However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.

{code}
String[][] data =
   (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
{code}

Simple test to reproduce:
{code}
import java.io.IOException;
import java.io.StringReader;

import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVStrategy;

public class CSVExcelStrategyBug {

        public static void main(final String[] args) {
                try {
                        System.out.println("Using ;");
                        parse("a;b\nc;d");
                        System.out.println();
                        System.out.println("Using ,");
                        parse("a,b\nc,d");
                } catch (final IOException e) {
                        e.printStackTrace();
                }

        }

        private static void parse(final String input) throws IOException {
                final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
                for (final String[] row : data) {
                        System.out.print("[");
                        for (final String cell : row) {
                                System.out.print("(" + cell + ")");
                        }
                        System.out.println("]");
                }
        }

}
{code}

Actual output:
{noformat}
Using ;
[(a;b)]
[(c;d)]

Using ,
[(a)(b)]
[(c)(d)]
{noformat}

Expected output:
{noformat}
Using ;
[(a)(b)]
[(c)(d)]

Using ,
[(a,b)]
[(c,d)]
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SANDBOX-263) Excel sategy uses wrong seperator

Richard (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gunnar Wagenknecht updated SANDBOX-263:
---------------------------------------

    Summary: Excel sategy uses wrong seperator  (was: Excel Strategy using wrong seperator)

> Excel sategy uses wrong seperator
> ---------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SANDBOX-263) Excel strategy uses wrong seperator

Richard (Jira)
In reply to this post by Richard (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb updated SANDBOX-263:
-------------------------

    Summary: Excel strategy uses wrong seperator  (was: Excel sategy uses wrong seperator)

> Excel strategy uses wrong seperator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb updated SANDBOX-263:
-------------------------

    Summary: Excel strategy uses wrong separator  (was: Excel strategy uses wrong seperator)

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647408#action_12647408 ]

Henri Yandell commented on SANDBOX-263:
---------------------------------------

Excel on the Mac puts a comma in. I don't have it on Windows, but my suspicion is that this might be a locale specific issue rather than Windows on the Mac putting a semi-colin in.

There are also DOS CSV and Windows CSV options in the Save As - the only difference appears to be newlines with the default CSV being the same as the DOS CSV.

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647443#action_12647443 ]

Niall Pemberton commented on SANDBOX-263:
-----------------------------------------

I did a saveAs csv using Excel 2007 on Windows and it used a comma as the separator.

Apparently (according to the Excel Help) - Excel uses the "List Separtor" configured for windows - so I think Henri is probably right. You can check the separator your system is configured with by going to the Windows "Control Panel", selecting "Regional and Language Options" and clicking on the "Customize" button in the "Regional Options" tab

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647538#action_12647538 ]

Gunnar Wagenknecht commented on SANDBOX-263:
--------------------------------------------

That's interesting. So any Excel strategy needs to be prepared for different separators. I wonder if the Excel strategy should also use the regional settings or if there should be different Excel strategies. The current way confuses users. At least the JavaDoc needs a  hint to the regional settings depended behavior. But IHMO forcing users to change their regional settings just to have some software reading CSV properly is not an option, is it?



> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/19/10 4:22 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         if (decimalSeparator == listSeparator) {
             // If the floating point separator is a comma, use semi-colon to minimize encapsulation

             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         if (decimalSeparator == listSeparator) {
             // If the floating point separator is a comma, use semi-colon to minimize encapsulation
             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek commented on SANDBOX-263:
--------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         if (decimalSeparator == listSeparator) {
             // If the floating point separator is a comma, use semi-colon to minimize encapsulation
             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/19/10 4:23 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         // If the floating point separator is a comma, use semi-colon to minimize encapsulation
         if (decimalSeparator == listSeparator) {
             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         if (decimalSeparator == listSeparator) {
             // If the floating point separator is a comma, use semi-colon to minimize encapsulation

             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/19/10 4:25 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         // If the floating point separator is a comma, use semi-colon to minimize encapsulation
         if (decimalSeparator == listSeparator) {
             listSeparator = ';';
         }
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         // If the floating point separator is a comma, use semi-colon to minimize encapsulation
         if (decimalSeparator == listSeparator) {
             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/19/10 4:27 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}// The following idea is based on a comment from
// http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         // If the floating point separator is a comma, use semi-colon to minimize encapsulation
         if (decimalSeparator == listSeparator) {
             listSeparator = ';';
         }
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 2:24 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On www.experts-exchange.com/Programming/Languages/Q_24113673.html
we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
Here (support.microsoft.com/kb/94825/EN-US) we can read: "For most international versions, the default list separator is a semicolon (;). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}// The following idea is based on a comment from
// http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 2:29 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|www.experts-exchange.com/Programming/Languages/Q_24113673.html]
we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon (\;). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On www.experts-exchange.com/Programming/Languages/Q_24113673.html
we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
Here (support.microsoft.com/kb/94825/EN-US) we can read: "For most international versions, the default list separator is a semicolon (;). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 2:32 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html]
we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|www.experts-exchange.com/Programming/Languages/Q_24113673.html]
we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon (\;). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 2:37 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == listSeparator) {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html]
we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 2:38 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == '.') {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == listSeparator) {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 3:20 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == '.') {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
EXCEL_STRATEGY can't be created on application scope, it's session scope or even request scope.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == '.') {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure individual solutions.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator

Richard (Jira)
In reply to this post by Richard (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419 ]

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 3:25 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == '.') {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
Using encapsulation with ", it is possible to have the same field and decimal separator.
An 'EXCEL_STRATEGY' can't be created statically on application scope, it's session scope or even request scope.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html] we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ';';
// If the floating point separator is really a point, use comma
if (decimalSeparator == '.') {
    listSeparator = ',';
}
{code}
CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
EXCEL_STRATEGY can't be created on application scope, it's session scope or even request scope.
 

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false,
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> public static void main(final String[] args) {
> try {
> System.out.println("Using ;");
> parse("a;b\nc;d");
> System.out.println();
> System.out.println("Using ,");
> parse("a,b\nc,d");
> } catch (final IOException e) {
> e.printStackTrace();
> }
> }
> private static void parse(final String input) throws IOException {
> final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> for (final String[] row : data) {
> System.out.print("[");
> for (final String cell : row) {
> System.out.print("(" + cell + ")");
> }
> System.out.println("]");
> }
> }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.