# [math] correlation analysis with NaNs Classic List Threaded 5 messages Open this post in threaded view
|

## [math] correlation analysis with NaNs

 Dear all, I have difficulties using the Spearman correlation analysis with double arrays that may contain NaN entries. As you see in my example I want to analyse the columns with entries {Double.NaN, 1, 2} and {10, 2, 10}. The output of the execution of the code below is: Ranking [1.0, 2.0] Ranking [2.5, 1.0, 2.5] correlations 0.8660254037844386 {code}          double[] column1 = new double[]{Double.NaN, 1, 2};          double[] column2 = new double[]{10, 2, 10};          NaturalRanking rank = new NaturalRanking(NaNStrategy.REMOVED);          double[] ranking1 = rank.rank(column1);          double[] ranking2 = rank.rank(column2);          System.out.println("Ranking " + Arrays.toString(ranking1));          System.out.println("Ranking " + Arrays.toString(ranking2));          SpearmansCorrelation s_corrs = new SpearmansCorrelation();          double correlations = s_corrs.correlation(column1, column2);          System.out.println("correlations " + correlations); {code} Like I understand Spearman the result of the correlation should be 1 because tuples that contain NaNs should be ignored in the ranking and in the correlation analysis. What I don't understand is why there are ranks like 2.5. My workaround works as follows: - use NaNStrategy.FIXED, so that the NaNs stay in place - execute the ranking - round down the ranks like 2.5 if they are not NaN (NaNs are cast to 0.0) - execute custom Pearson correlation that ignores tuples with NaNs on the ranked arrays Here is the code: {code} double[] column1 = new double[]{Double.NaN, 1, 2};          double[] column2 = new double[]{10, 2, 10};          NaturalRanking rank = new NaturalRanking(NaNStrategy.FIXED);          double[] ranking1 = rank.rank(column1);          double[] ranking2 = rank.rank(column2);          for (int i = 0; i < ranking1.length; i++) {              if (!Double.isNaN(ranking1[i])) {                  ranking1[i] = (int) ranking1[i];              }              if (!Double.isNaN(ranking2[i])) {                  ranking2[i] = (int) ranking2[i];              }          }          System.out.println("Ranking " + Arrays.toString(ranking1));          System.out.println("Ranking " + Arrays.toString(ranking2));          PearsonsCorrelation p_corrs = new PearsonsCorrelation();          double correlations = p_corrs.correlationNaNs(column1, column2);          System.out.println("correlations " + correlations); {code} I hope that my solution for dealing with NaNs isn't missing anything. Perhaps you can comment on this. Kind regards Martin --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]
Open this post in threaded view
|

## RE: [math] correlation analysis with NaNs

 You are getting values like 2.5 because of the default ties strategy. If you do not want to use that method, create an instance of RankingAlgorithm with a different ties strategy and pass it to the constructor for the SpearmanCorrelation. This approach also gives you control over the method for dealing with NaNs. Something like, //create data matrix double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = new double[]{10, 2, 10}; Array2DRowRealMatrix mydata = new Array2DRowRealMatrix(); For(int i=0;i
Open this post in threaded view
|

## Re: [math] correlation analysis with NaNs

 On 11/07/2012 01:38 PM, Patrick Meyer wrote: > You are getting values like 2.5 because of the default ties strategy. If you > do not want to use that method, create an instance of RankingAlgorithm with > a different ties strategy and pass it to the constructor for the > SpearmanCorrelation. This approach also gives you control over the method > for dealing with NaNs. Something like, > > //create data matrix > double[] column1 = new double[]{Double.NaN, 1, 2}; > double[] column2 = new double[]{10, 2, 10}; > Array2DRowRealMatrix mydata = new Array2DRowRealMatrix(); > For(int i=0;i mydata.addToEntry(i, 0, column1[i]); > mydata.addToEntry(i, 1, column2[i]); > } > > //compute correlation > NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED, > TiesStrategy.RANDOM); > SpearmanCorrelation spearman = new SpearmanCorrelation(ranking, mydata); > > Try that. Hi, this will not really help imho. As far as I can see, there are at least two problems with the current use of the RankingAlgorithm in the SpearmanCorrelation class:  * there is no way to select the ranking algorithm in the constructor    without passing the values at the same time  * the NaNStrategy.REMOVED does not work symmetrically, i.e. it removes    the NaN only from the input array where it occurs but not in the    corresponding array, thus rendering it useless as it will result in    exceptions (array lengths differ) Would you be able to create an issue for this on the issue tracker and provide the test case? Thanks, Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]