You are getting values like 2.5 because of the default ties strategy. If you

do not want to use that method, create an instance of RankingAlgorithm with

a different ties strategy and pass it to the constructor for the

SpearmanCorrelation. This approach also gives you control over the method

for dealing with NaNs. Something like,

//create data matrix

double[] column1 = new double[]{Double.NaN, 1, 2};

double[] column2 = new double[]{10, 2, 10};

Array2DRowRealMatrix mydata = new Array2DRowRealMatrix();

For(int i=0;i<column1.length;i++){

mydata.addToEntry(i, 0, column1[i]);

mydata.addToEntry(i, 1, column2[i]);

}

//compute correlation

NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED,

TiesStrategy.RANDOM);

SpearmanCorrelation spearman = new SpearmanCorrelation(ranking, mydata);

Try that.

-----Original Message-----

From: Martin Rosellen [mailto:

[hidden email]]

Sent: Wednesday, November 07, 2012 6:10 AM

To: Commons Users List

Subject: [math] correlation analysis with NaNs

Dear all,

I have difficulties using the Spearman correlation analysis with double

arrays that may contain NaN entries. As you see in my example I want to

analyse the columns with entries {Double.NaN, 1, 2} and {10, 2, 10}. The

output of the execution of the code below is:

Ranking [1.0, 2.0]

Ranking [2.5, 1.0, 2.5]

correlations 0.8660254037844386

{code}

double[] column1 = new double[]{Double.NaN, 1, 2};

double[] column2 = new double[]{10, 2, 10};

NaturalRanking rank = new NaturalRanking(NaNStrategy.REMOVED);

double[] ranking1 = rank.rank(column1);

double[] ranking2 = rank.rank(column2);

System.out.println("Ranking " + Arrays.toString(ranking1));

System.out.println("Ranking " + Arrays.toString(ranking2));

SpearmansCorrelation s_corrs = new SpearmansCorrelation();

double correlations = s_corrs.correlation(column1, column2);

System.out.println("correlations " + correlations); {code}

Like I understand Spearman the result of the correlation should be 1 because

tuples that contain NaNs should be ignored in the ranking and in the

correlation analysis. What I don't understand is why there are ranks like

2.5.

My workaround works as follows:

- use NaNStrategy.FIXED, so that the NaNs stay in place

- execute the ranking

- round down the ranks like 2.5 if they are not NaN (NaNs are cast to 0.0)

- execute custom Pearson correlation that ignores tuples with NaNs on the

ranked arrays

Here is the code:

{code}

double[] column1 = new double[]{Double.NaN, 1, 2};

double[] column2 = new double[]{10, 2, 10};

NaturalRanking rank = new NaturalRanking(NaNStrategy.FIXED);

double[] ranking1 = rank.rank(column1);

double[] ranking2 = rank.rank(column2);

for (int i = 0; i < ranking1.length; i++) {

if (!Double.isNaN(ranking1[i])) {

ranking1[i] = (int) ranking1[i];

}

if (!Double.isNaN(ranking2[i])) {

ranking2[i] = (int) ranking2[i];

}

}

System.out.println("Ranking " + Arrays.toString(ranking1));

System.out.println("Ranking " + Arrays.toString(ranking2));

PearsonsCorrelation p_corrs = new PearsonsCorrelation();

double correlations = p_corrs.correlationNaNs(column1, column2);

System.out.println("correlations " + correlations); {code}

I hope that my solution for dealing with NaNs isn't missing anything.

Perhaps you can comment on this.

Kind regards

Martin

---------------------------------------------------------------------

To unsubscribe, e-mail:

[hidden email]
For additional commands, e-mail:

[hidden email]
---------------------------------------------------------------------

To unsubscribe, e-mail:

[hidden email]
For additional commands, e-mail:

[hidden email]