[math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

chentao106@qq.com
Hi all,
    The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused MiniBatchKMeansClusterer and other cluster altorithm.
    So I think the "EmptyClusterStrategy" should move out from KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
    I am not sure if my design is good or not. I think here should be a interface:

Solution 1: Explicit indicate the usage by class name and function name.
```java
@FunctionalInterface
public interface ClusterBreeder {
    <T extends Clusterable> T newCenterPoint((final Collection<CentroidCluster<T extends Clusterable>> clusters);
}
...
// Implementations
public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
public MostPopularClusterPointBreeder implements ClusterBreeder{...}
public FarthestPointBreeder implements ClusterBreeder{...}
...
// Usage
// KMeansPlusPlusClusterer.java
public class KMeansPlusPlusClusterer<T extends Clusterable> extends Clusterer<T> {
    ...
    private final ClusterBreeder clusterBreeder;
    public KMeansPlusPlusClusterer(final int k, final int maxIterations,
                               final DistanceMeasure measure,
                               final UniformRandomProvider random,
                               final ClusterBreeder clusterBreeder) {
        ...
        this.clusterBreeder=clusterBreeder;
    }
    ...
    public List<CentroidCluster<T>> cluster(final Collection<T> points) {
        ...
            if (cluster.getPoints().isEmpty()) {
                if (clusterBreeder == null) {
                    throw new ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
                } else {
                    newCenter = clusterBreeder.newCenterPoint(clusters);
                }
            }
        ...
    }
}
```

Solution2: Declare a more generic interface:
```java
@FunctionalInterface
public interface ClustersPointFinder {
    <T extends Clusterable> T find((final Collection<CentroidCluster<T extends Clusterable>> clusters);
}

...
// Implementations
public LargestVarianceClusterPointFinder implements ClustersPointFinder {...}
public MostPopularClusterPointFinder implements ClustersPointFinder {...}
public FarthestPointFinder implements ClustersPointFinder {...}
```

Thanks,
-CT
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

Gilles Sadowski-2
Hello.

Le mer. 11 mars 2020 à 07:28, [hidden email] <[hidden email]> a écrit :
>
> Hi all,
>     The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused MiniBatchKMeansClusterer and other cluster altorithm.
>     So I think the "EmptyClusterStrategy" should move out from KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
>     I am not sure if my design is good or not.

I can't say either; please provide more context/explanation
about the excerpts below.

> I think here should be a interface:
>
> Solution 1: Explicit indicate the usage by class name and function name.
> ```java
> @FunctionalInterface
> public interface ClusterBreeder {
>     <T extends Clusterable> T newCenterPoint((final Collection<CentroidCluster<T extends Clusterable>> clusters);
> }

What is a "Breeder"?
This seems to further complicates the matter; what is a "center" (if there
can be old and new ones).

Regards,
Gilles

> ...
> // Implementations
> public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
> public MostPopularClusterPointBreeder implements ClusterBreeder{...}
> public FarthestPointBreeder implements ClusterBreeder{...}
> ...
> // Usage
> // KMeansPlusPlusClusterer.java
> public class KMeansPlusPlusClusterer<T extends Clusterable> extends Clusterer<T> {
>     ...
>     private final ClusterBreeder clusterBreeder;
>     public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>                                final DistanceMeasure measure,
>                                final UniformRandomProvider random,
>                                final ClusterBreeder clusterBreeder) {
>         ...
>         this.clusterBreeder=clusterBreeder;
>     }
>     ...
>     public List<CentroidCluster<T>> cluster(final Collection<T> points) {
>         ...
>             if (cluster.getPoints().isEmpty()) {
>                 if (clusterBreeder == null) {
>                     throw new ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
>                 } else {
>                     newCenter = clusterBreeder.newCenterPoint(clusters);
>                 }
>             }
>         ...
>     }
> }
> ```
>
> Solution2: Declare a more generic interface:
> ```java
> @FunctionalInterface
> public interface ClustersPointFinder {
>     <T extends Clusterable> T find((final Collection<CentroidCluster<T extends Clusterable>> clusters);
> }
>
> ...
> // Implementations
> public LargestVarianceClusterPointFinder implements ClustersPointFinder {...}
> public MostPopularClusterPointFinder implements ClustersPointFinder {...}
> public FarthestPointFinder implements ClustersPointFinder {...}
> ```
>
> Thanks,
> -CT

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

chentao106@qq.com
Hi,
    Code maybe more clear:
https://github.com/chentao106/commons-math/commit/0075cd76f540955ec7da3dc4b74f4d67d932ba3b

>Hello.
>
>Le mer. 11 mars 2020 à 07:28, [hidden email] <[hidden email]> a écrit :
>>
>> Hi all,
>>     The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused MiniBatchKMeansClusterer and other cluster altorithm.
>>     So I think the "EmptyClusterStrategy" should move out from KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
>>     I am not sure if my design is good or not.
>
>I can't say either; please provide more context/explanation
>about the excerpts below.
>
>> I think here should be a interface:
>>
>> Solution 1: Explicit indicate the usage by class name and function name.
>> ```java
>> @FunctionalInterface
>> public interface ClusterBreeder {
>>     <T extends Clusterable> T newCenterPoint((final Collection<CentroidCluster<T extends Clusterable>> clusters);
>> }
>
>What is a "Breeder"?
>This seems to further complicates the matter; what is a "center" (if there
>can be old and new ones).
>
>Regards,
>Gilles
>
>> ...
>> // Implementations
>> public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
>> public MostPopularClusterPointBreeder implements ClusterBreeder{...}
>> public FarthestPointBreeder implements ClusterBreeder{...}
>> ...
>> // Usage
>> // KMeansPlusPlusClusterer.java
>> public class KMeansPlusPlusClusterer<T extends Clusterable> extends Clusterer<T> {
>>     ...
>>     private final ClusterBreeder clusterBreeder;
>>     public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>>                                final DistanceMeasure measure,
>>                                final UniformRandomProvider random,
>>                                final ClusterBreeder clusterBreeder) {
>>         ...
>>         this.clusterBreeder=clusterBreeder;
>>     }
>>     ...
>>     public List<CentroidCluster<T>> cluster(final Collection<T> points) {
>>         ...
>>             if (cluster.getPoints().isEmpty()) {
>>                 if (clusterBreeder == null) {
>>                     throw new ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
>>                 } else {
>>                     newCenter = clusterBreeder.newCenterPoint(clusters);
>>                 }
>>             }
>>         ...
>>     }
>> }
>> ```
>>
>> Solution2: Declare a more generic interface:
>> ```java
>> @FunctionalInterface
>> public interface ClustersPointFinder {
>>     <T extends Clusterable> T find((final Collection<CentroidCluster<T extends Clusterable>> clusters);
>> }
>>
>> ...
>> // Implementations
>> public LargestVarianceClusterPointFinder implements ClustersPointFinder {...}
>> public MostPopularClusterPointFinder implements ClustersPointFinder {...}
>> public FarthestPointFinder implements ClustersPointFinder {...}
>> ```
>>
>> Thanks,
>> -CT
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

chentao106@qq.com
In reply to this post by Gilles Sadowski-2
Hi,
    I have created a PR to show my aim: https://github.com/apache/commons-math/pull/126/files 

>Hello.
>
>Le mer. 11 mars 2020 à 07:28, [hidden email] <[hidden email]> a écrit :
>>
>> Hi all,
>>     The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused MiniBatchKMeansClusterer and other cluster altorithm.
>>     So I think the "EmptyClusterStrategy" should move out from KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
>>     I am not sure if my design is good or not.
>
>I can't say either; please provide more context/explanation
>about the excerpts below.
>
>> I think here should be a interface:
>>
>> Solution 1: Explicit indicate the usage by class name and function name.
>> ```java
>> @FunctionalInterface
>> public interface ClusterBreeder {
>>     <T extends Clusterable> T newCenterPoint((final Collection<CentroidCluster<T extends Clusterable>> clusters);
>> }
>
>What is a "Breeder"?
>This seems to further complicates the matter; what is a "center" (if there
>can be old and new ones).

I mean a method to create a new Cluster from exists clusters.

>
>Regards,
>Gilles
>
>> ...
>> // Implementations
>> public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
>> public MostPopularClusterPointBreeder implements ClusterBreeder{...}
>> public FarthestPointBreeder implements ClusterBreeder{...}
>> ...
>> // Usage
>> // KMeansPlusPlusClusterer.java
>> public class KMeansPlusPlusClusterer<T extends Clusterable> extends Clusterer<T> {
>>     ...
>>     private final ClusterBreeder clusterBreeder;
>>     public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>>                                final DistanceMeasure measure,
>>                                final UniformRandomProvider random,
>>                                final ClusterBreeder clusterBreeder) {
>>         ...
>>         this.clusterBreeder=clusterBreeder;
>>     }
>>     ...
>>     public List<CentroidCluster<T>> cluster(final Collection<T> points) {
>>         ...
>>             if (cluster.getPoints().isEmpty()) {
>>                 if (clusterBreeder == null) {
>>                     throw new ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
>>                 } else {
>>                     newCenter = clusterBreeder.newCenterPoint(clusters);
>>                 }
>>             }
>>         ...
>>     }
>> }
>> ```
>>
>> Solution2: Declare a more generic interface:
>> ```java
>> @FunctionalInterface
>> public interface ClustersPointFinder {
>>     <T extends Clusterable> T find((final Collection<CentroidCluster<T extends Clusterable>> clusters);
>> }
>>
>> ...
>> // Implementations
>> public LargestVarianceClusterPointFinder implements ClustersPointFinder {...}
>> public MostPopularClusterPointFinder implements ClustersPointFinder {...}
>> public FarthestPointFinder implements ClustersPointFinder {...}
>> ```
>>
>> Thanks,
>> -CT
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

Gilles Sadowski-2
Hi.

2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
> Hi,
>     I have created a PR to show my aim:
> https://github.com/apache/commons-math/pull/126/files

Am I correct that the implementations of "ClustersPointExtractor"
modify the argument of the "extract" method?
If so, that seems quite unsafe.  I would not expect this behaviour
in a public API.

Unless I missed some point, I'd ask again that the API be reviewed
*before* implementing several features (such as those "extractors")
on top of something that does not look right.

Best regards,
Gilles

>
>>Hello.
>>
>>Le mer. 11 mars 2020 à 07:28, [hidden email] <[hidden email]> a écrit
>> :
>>>
>>> Hi all,
>>>     The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused
>>> MiniBatchKMeansClusterer and other cluster altorithm.
>>>     So I think the "EmptyClusterStrategy" should move out from
>>> KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
>>>     I am not sure if my design is good or not.
>>
>>I can't say either; please provide more context/explanation
>>about the excerpts below.
>>
>>> I think here should be a interface:
>>>
>>> Solution 1: Explicit indicate the usage by class name and function name.
>>> ```java
>>> @FunctionalInterface
>>> public interface ClusterBreeder {
>>>     <T extends Clusterable> T newCenterPoint((final
>>> Collection<CentroidCluster<T extends Clusterable>> clusters);
>>> }
>>
>>What is a "Breeder"?
>>This seems to further complicates the matter; what is a "center" (if there
>>can be old and new ones).
>
> I mean a method to create a new Cluster from exists clusters.
>
>>
>>Regards,
>>Gilles
>>
>>> ...
>>> // Implementations
>>> public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
>>> public MostPopularClusterPointBreeder implements ClusterBreeder{...}
>>> public FarthestPointBreeder implements ClusterBreeder{...}
>>> ...
>>> // Usage
>>> // KMeansPlusPlusClusterer.java
>>> public class KMeansPlusPlusClusterer<T extends Clusterable> extends
>>> Clusterer<T> {
>>>     ...
>>>     private final ClusterBreeder clusterBreeder;
>>>     public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>>>                                final DistanceMeasure measure,
>>>                                final UniformRandomProvider random,
>>>                                final ClusterBreeder clusterBreeder) {
>>>         ...
>>>         this.clusterBreeder=clusterBreeder;
>>>     }
>>>     ...
>>>     public List<CentroidCluster<T>> cluster(final Collection<T> points) {
>>>         ...
>>>             if (cluster.getPoints().isEmpty()) {
>>>                 if (clusterBreeder == null) {
>>>                     throw new
>>> ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
>>>                 } else {
>>>                     newCenter = clusterBreeder.newCenterPoint(clusters);
>>>                 }
>>>             }
>>>         ...
>>>     }
>>> }
>>> ```
>>>
>>> Solution2: Declare a more generic interface:
>>> ```java
>>> @FunctionalInterface
>>> public interface ClustersPointFinder {
>>>     <T extends Clusterable> T find((final Collection<CentroidCluster<T
>>> extends Clusterable>> clusters);
>>> }
>>>
>>> ...
>>> // Implementations
>>> public LargestVarianceClusterPointFinder implements ClustersPointFinder
>>> {...}
>>> public MostPopularClusterPointFinder implements ClustersPointFinder {...}
>>> public FarthestPointFinder implements ClustersPointFinder {...}
>>> ```
>>>
>>> Thanks,
>>> -CT

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

Gilles Sadowski-2
Hello.

Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a écrit :

>
> Hi.
>
> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
> > Hi,
> >     I have created a PR to show my aim:
> > https://github.com/apache/commons-math/pull/126/files
>
> Am I correct that the implementations of "ClustersPointExtractor"
> modify the argument of the "extract" method?
> If so, that seems quite unsafe.  I would not expect this behaviour
> in a public API.

To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
code):

public class Extractor {
    T extract(List<List<T>> list);
}

with something along those lines:

public class ClusterSet<T> {
    private Set<T> data;
    private List<List<T>> clusters;

    void remove(T e) {
        return data.remove(e);
    }

    public interface Visitor {
        T visit(List<List<T>> list);
    }
}

Key point is separation of concern (selection of element vs
removal of element).

Of course the devil is in the details (maintain consistency among
the fields of "ClusterSet", ensure that "Visitor" implementations
cannot modify the internal state, ...).

> Unless I missed some point, I'd ask again that the API be reviewed
> *before* implementing several features (such as those "extractors")
> on top of something that does not look right.

One perspective might be to try and come with a design for use in
multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").

Best regards,
Gilles

> >> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

chentao106@qq.com
Hi,

>Hello.
>
>Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a écrit :
>>
>> Hi.
>>
>> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
>> > Hi,
>> >     I have created a PR to show my aim:
>> > https://github.com/apache/commons-math/pull/126/files
>>
>> Am I correct that the implementations of "ClustersPointExtractor"
>> modify the argument of the "extract" method?
>> If so, that seems quite unsafe.  I would not expect this behaviour
>> in a public API.
>
>To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>code):
>
>public class Extractor {
>    T extract(List<List<T>> list);
>}

I have read the exists code again, and I found "EmptyClusterStrategy"
 and related logic only used in K-Means.
And the "Extractor" remove a point from exists Clusters is indeed unsuitable
to be a public API(as you mentioned before.)
I think another choice is simple make "EmptyClusterStrategy" and related logic protected,
then it can be resuse by MiniBatchKMeans.

Also after the "CentroidInitializer" has been move out, the "KMeansPlusPlusClusterer"
 should rename to "KMeansClusterer" with a construct parameter "CentroidInitializer"


>
>with something along those lines:
>
>public class ClusterSet<T> {
>    private Set<T> data;
>    private List<List<T>> clusters;
>
>    void remove(T e) {
>        return data.remove(e);
>    }
>
>    public interface Visitor {
>        T visit(List<List<T>> list);
>    }
>}
>
>Key point is separation of concern (selection of element vs
>removal of element).

I propose the ClusterSet should has more member like "distanceMeasure"
 to store origin clustering parameters,
or  to accelerate or support cluster finding or point finding.
```java
public class ClusterSet<T> {
    private List<Cluster> clusters;
    private DistanceMeasure measure;
    ...
    public List<Cluster> closestTopN(Clusterable point);

    public int pointSize();
}
```

>
>Of course the devil is in the details (maintain consistency among
>the fields of "ClusterSet", ensure that "Visitor" implementations
>cannot modify the internal state, ...).
>
>> Unless I missed some point, I'd ask again that the API be reviewed
>> *before* implementing several features (such as those "extractors")
>> on top of something that does not look right.
>
>One perspective might be to try and come with a design for use in
>multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>
>Best regards,
>Gilles
>
>> >> [...]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

Gilles Sadowski-2
Le ven. 20 mars 2020 à 04:47, [hidden email] <[hidden email]> a écrit :

>
> Hi,
>
> >Hello.
> >
> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a écrit :
> >>
> >> Hi.
> >>
> >> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
> >> > Hi,
> >> >     I have created a PR to show my aim:
> >> > https://github.com/apache/commons-math/pull/126/files
> >>
> >> Am I correct that the implementations of "ClustersPointExtractor"
> >> modify the argument of the "extract" method?
> >> If so, that seems quite unsafe.  I would not expect this behaviour
> >> in a public API.
> >
> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
> >code):
> >
> >public class Extractor {
> >    T extract(List<List<T>> list);
> >}
>
> I have read the exists code again, and I found "EmptyClusterStrategy"
>  and related logic only used in K-Means.
> And the "Extractor" remove a point from exists Clusters is indeed unsuitable
> to be a public API(as you mentioned before.)
> I think another choice is simple make "EmptyClusterStrategy" and related logic protected,
> then it can be resuse by MiniBatchKMeans.
>
> Also after the "CentroidInitializer" has been move out, the "KMeansPlusPlusClusterer"
>  should rename to "KMeansClusterer" with a construct parameter "CentroidInitializer"

If you think that it makes sense to define a class hierarchy for
"KMeans"-based algorithms, please do so.  But note that "protected"
is only slightly better than "public" (from a maintenance POV, it isn't
better, since we'll be required to maintain compatibility all the same).
Perhaps the base classe(s) can be package-private...

> >
> >with something along those lines:
> >
> >public class ClusterSet<T> {
> >    private Set<T> data;
> >    private List<List<T>> clusters;
> >
> >    void remove(T e) {
> >        return data.remove(e);
> >    }
> >
> >    public interface Visitor {
> >        T visit(List<List<T>> list);
> >    }
> >}
> >
> >Key point is separation of concern (selection of element vs
> >removal of element).
>
> I propose the ClusterSet should has more member like "distanceMeasure"
>  to store origin clustering parameters,
> or  to accelerate or support cluster finding or point finding.
> ```java
> public class ClusterSet<T> {
>     private List<Cluster> clusters;
>     private DistanceMeasure measure;
>     ...
>     public List<Cluster> closestTopN(Clusterable point);

How about providing a comparator factory?

public <T extends Clusterable> static
 Comparator<Cluster<T>> getComparator(final Clusterable p, final
DistanceMeasure m) {
    return new Comparator<>() {
        public int compare(Cluster<T> c1, Cluster<T> c2) {
            final double d1 = c1.distance(p, m); // "distance" method
to be added to the interface (?).
            final double d2 = c2.distance(p, m);
            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
        }
    }
}

Then, if also providing an accessor to the clusters:

public List<Cluster<T>> getClusters() {
    return Collections.umodifiableList(clusters);
}

users can do whatever they want, a.o. easily implement the
ranking method which you suggest:

/** @return clusters in descending order of proximity. */
public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
DistanceMeasure m) {
    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
    Collections.sort(cList, set.getComparator(p, m));
    return cList;
}

>
>     public int pointSize();
> }
> ```

A priori, I'd favour more encapsulation, in order to ensure that
multi-thread access can be controlled, e.g. hide the "ClusterSet"
data-structures, and provide access through identifiers that are
atomically created when clusters are added to the instance, and
so on...

It would be great if you can come up with a concrete (and fully
tested) implementation of "ClusterSet". ;-)

Best,
Gilles

> >
> >Of course the devil is in the details (maintain consistency among
> >the fields of "ClusterSet", ensure that "Visitor" implementations
> >cannot modify the internal state, ...).
> >
> >> Unless I missed some point, I'd ask again that the API be reviewed
> >> *before* implementing several features (such as those "extractors")
> >> on top of something that does not look right.
> >
> >One perspective might be to try and come with a design for use in
> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
> >
> >Best regards,
> >Gilles
> >
> >> >> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

chentao106@qq.com
Hi,

>Le ven. 20 mars 2020 à 04:47, [hidden email] <[hidden email]> a écrit :
>>
>> Hi,
>>
>> >Hello.
>> >
>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a écrit :
>> >>
>> >> Hi.
>> >>
>> >> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
>> >> > Hi,
>> >> >     I have created a PR to show my aim:
>> >> > https://github.com/apache/commons-math/pull/126/files
>> >>
>> >> Am I correct that the implementations of "ClustersPointExtractor"
>> >> modify the argument of the "extract" method?
>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>> >> in a public API.
>> >
>> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>> >code):
>> >
>> >public class Extractor {
>> >    T extract(List<List<T>> list);
>> >}
>>
>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>  and related logic only used in K-Means.
>> And the "Extractor" remove a point from exists Clusters is indeed unsuitable
>> to be a public API(as you mentioned before.)
>> I think another choice is simple make "EmptyClusterStrategy" and related logic protected,
>> then it can be resuse by MiniBatchKMeans.
>>
>> Also after the "CentroidInitializer" has been move out, the "KMeansPlusPlusClusterer"
>>  should rename to "KMeansClusterer" with a construct parameter "CentroidInitializer"
>
>If you think that it makes sense to define a class hierarchy for
>"KMeans"-based algorithms, please do so.  But note that "protected"
>is only slightly better than "public" (from a maintenance POV, it isn't
>better, since we'll be required to maintain compatibility all the same).
>Perhaps the base classe(s) can be package-private...

OK, package-private on "EmptyClusterStrategy" and related logic is fine.

>
>> >
>> >with something along those lines:
>> >
>> >public class ClusterSet<T> {
>> >    private Set<T> data;
>> >    private List<List<T>> clusters;
>> >
>> >    void remove(T e) {
>> >        return data.remove(e);
>> >    }
>> >
>> >    public interface Visitor {
>> >        T visit(List<List<T>> list);
>> >    }
>> >}
>> >
>> >Key point is separation of concern (selection of element vs
>> >removal of element).
>>
>> I propose the ClusterSet should has more member like "distanceMeasure"
>>  to store origin clustering parameters,
>> or  to accelerate or support cluster finding or point finding.
>> ```java
>> public class ClusterSet<T> {
>>     private List<Cluster> clusters;
>>     private DistanceMeasure measure;
>>     ...
>>     public List<Cluster> closestTopN(Clusterable point);
>
>How about providing a comparator factory?
>
>public <T extends Clusterable> static
> Comparator<Cluster<T>> getComparator(final Clusterable p, final
>DistanceMeasure m) {
>    return new Comparator<>() {
>        public int compare(Cluster<T> c1, Cluster<T> c2) {
>            final double d1 = c1.distance(p, m); // "distance" method
>to be added to the interface (?).
>            final double d2 = c2.distance(p, m);
>            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
>        }
>    }
>}

How do you think about the "distanceMeasure" to be a memeber of "ClusterSet",
In my opinion "ClusterSet" should contains all the parameters and result for clustering and clusters.

>
>Then, if also providing an accessor to the clusters:
>
>public List<Cluster<T>> getClusters() {
>    return Collections.umodifiableList(clusters);
>}
>
>users can do whatever they want, a.o. easily implement the
>ranking method which you suggest:
>
>/** @return clusters in descending order of proximity. */
>public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
>DistanceMeasure m) {
>    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
>    Collections.sort(cList, set.getComparator(p, m));
>    return cList;
>}
>
>>
>>     public int pointSize();
>> }
>> ```
>
>A priori, I'd favour more encapsulation, in order to ensure that
>multi-thread access can be controlled, e.g. hide the "ClusterSet"
>data-structures, and provide access through identifiers that are
>atomically created when clusters are added to the instance, and
>so on...
>
>It would be great if you can come up with a concrete (and fully
>tested) implementation of "ClusterSet". ;-)

I would like to try, but it is a big change to API, we need more time and discussions.

>
>Best,
>Gilles
>
>> >
>> >Of course the devil is in the details (maintain consistency among
>> >the fields of "ClusterSet", ensure that "Visitor" implementations
>> >cannot modify the internal state, ...).
>> >
>> >> Unless I missed some point, I'd ask again that the API be reviewed
>> >> *before* implementing several features (such as those "extractors")
>> >> on top of something that does not look right.
>> >
>> >One perspective might be to try and come with a design for use in
>> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>> >
>> >Best regards,
>> >Gilles
>> >
>> >> >> [...]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

Gilles Sadowski-2
2020-03-21 2:59 UTC+01:00, [hidden email] <[hidden email]>:

> Hi,
>
>>Le ven. 20 mars 2020 à 04:47, [hidden email] <[hidden email]> a écrit
>> :
>>>
>>> Hi,
>>>
>>> >Hello.
>>> >
>>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a
>>> > écrit :
>>> >>
>>> >> Hi.
>>> >>
>>> >> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
>>> >> > Hi,
>>> >> >     I have created a PR to show my aim:
>>> >> > https://github.com/apache/commons-math/pull/126/files
>>> >>
>>> >> Am I correct that the implementations of "ClustersPointExtractor"
>>> >> modify the argument of the "extract" method?
>>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>>> >> in a public API.
>>> >
>>> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>>> >code):
>>> >
>>> >public class Extractor {
>>> >    T extract(List<List<T>> list);
>>> >}
>>>
>>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>>  and related logic only used in K-Means.
>>> And the "Extractor" remove a point from exists Clusters is indeed
>>> unsuitable
>>> to be a public API(as you mentioned before.)
>>> I think another choice is simple make "EmptyClusterStrategy" and related
>>> logic protected,
>>> then it can be resuse by MiniBatchKMeans.
>>>
>>> Also after the "CentroidInitializer" has been move out, the
>>> "KMeansPlusPlusClusterer"
>>>  should rename to "KMeansClusterer" with a construct parameter
>>> "CentroidInitializer"
>>
>>If you think that it makes sense to define a class hierarchy for
>>"KMeans"-based algorithms, please do so.  But note that "protected"
>>is only slightly better than "public" (from a maintenance POV, it isn't
>>better, since we'll be required to maintain compatibility all the same).
>>Perhaps the base classe(s) can be package-private...
>
> OK, package-private on "EmptyClusterStrategy" and related logic is fine.
>
>>
>>> >
>>> >with something along those lines:
>>> >
>>> >public class ClusterSet<T> {
>>> >    private Set<T> data;
>>> >    private List<List<T>> clusters;
>>> >
>>> >    void remove(T e) {
>>> >        return data.remove(e);
>>> >    }
>>> >
>>> >    public interface Visitor {
>>> >        T visit(List<List<T>> list);
>>> >    }
>>> >}
>>> >
>>> >Key point is separation of concern (selection of element vs
>>> >removal of element).
>>>
>>> I propose the ClusterSet should has more member like "distanceMeasure"
>>>  to store origin clustering parameters,
>>> or  to accelerate or support cluster finding or point finding.
>>> ```java
>>> public class ClusterSet<T> {
>>>     private List<Cluster> clusters;
>>>     private DistanceMeasure measure;
>>>     ...
>>>     public List<Cluster> closestTopN(Clusterable point);
>>
>>How about providing a comparator factory?
>>
>>public <T extends Clusterable> static
>> Comparator<Cluster<T>> getComparator(final Clusterable p, final
>>DistanceMeasure m) {
>>    return new Comparator<>() {
>>        public int compare(Cluster<T> c1, Cluster<T> c2) {
>>            final double d1 = c1.distance(p, m); // "distance" method
>>to be added to the interface (?).
>>            final double d2 = c2.distance(p, m);
>>            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
>>        }
>>    }
>>}
>
> How do you think about the "distanceMeasure" to be a memeber of
> "ClusterSet",

I don't see why.  A priori "ClusterSet" represents groupings
of data, and the distance function is a parameter that e.g.
determines whether the grouping is good or bad.

> In my opinion "ClusterSet" should contains all the parameters and result for
> clustering and clusters.

IIRC, Alex mentioned "ClusterResult" (?).  IIUC, this would
indeed collect everything necessary to evaluate the quality
of the clustering (i.e. probably also the "DistanceMeasure"
instance used by the clustering algorithm).

>>
>>Then, if also providing an accessor to the clusters:
>>
>>public List<Cluster<T>> getClusters() {
>>    return Collections.umodifiableList(clusters);
>>}
>>
>>users can do whatever they want, a.o. easily implement the
>>ranking method which you suggest:
>>
>>/** @return clusters in descending order of proximity. */
>>public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
>>DistanceMeasure m) {
>>    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
>>    Collections.sort(cList, set.getComparator(p, m));
>>    return cList;
>>}
>>
>>>
>>>     public int pointSize();
>>> }
>>> ```
>>
>>A priori, I'd favour more encapsulation, in order to ensure that
>>multi-thread access can be controlled, e.g. hide the "ClusterSet"
>>data-structures, and provide access through identifiers that are
>>atomically created when clusters are added to the instance, and
>>so on...
>>
>>It would be great if you can come up with a concrete (and fully
>>tested) implementation of "ClusterSet". ;-)
>
> I would like to try, but it is a big change to API, we need more time and
> discussions.

Do you have specific questions?
Otherwise, there were already several ideas and hints for
a more flexible and more efficient design to get going...

Gilles

>>
>>> >
>>> >Of course the devil is in the details (maintain consistency among
>>> >the fields of "ClusterSet", ensure that "Visitor" implementations
>>> >cannot modify the internal state, ...).
>>> >
>>> >> Unless I missed some point, I'd ask again that the API be reviewed
>>> >> *before* implementing several features (such as those "extractors")
>>> >> on top of something that does not look right.
>>> >
>>> >One perspective might be to try and come with a design for use in
>>> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>>> >
>>> >Best regards,
>>> >Gilles
>>> >
>>> >> >> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

chentao106@qq.com
Hi,

>2020-03-21 2:59 UTC+01:00, [hidden email] <[hidden email]>:
>> Hi,
>>
>>>Le ven. 20 mars 2020 à 04:47, [hidden email] <[hidden email]> a écrit
>>> :
>>>>
>>>> Hi,
>>>>
>>>> >Hello.
>>>> >
>>>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a
>>>> > écrit :
>>>> >>
>>>> >> Hi.
>>>> >>
>>>> >> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
>>>> >> > Hi,
>>>> >> >     I have created a PR to show my aim:
>>>> >> > https://github.com/apache/commons-math/pull/126/files
>>>> >>
>>>> >> Am I correct that the implementations of "ClustersPointExtractor"
>>>> >> modify the argument of the "extract" method?
>>>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>>>> >> in a public API.
>>>> >
>>>> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>>>> >code):
>>>> >
>>>> >public class Extractor {
>>>> >    T extract(List<List<T>> list);
>>>> >}
>>>>
>>>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>>>  and related logic only used in K-Means.
>>>> And the "Extractor" remove a point from exists Clusters is indeed
>>>> unsuitable
>>>> to be a public API(as you mentioned before.)
>>>> I think another choice is simple make "EmptyClusterStrategy" and related
>>>> logic protected,
>>>> then it can be resuse by MiniBatchKMeans.
>>>>
>>>> Also after the "CentroidInitializer" has been move out, the
>>>> "KMeansPlusPlusClusterer"
>>>>  should rename to "KMeansClusterer" with a construct parameter
>>>> "CentroidInitializer"
>>>
>>>If you think that it makes sense to define a class hierarchy for
>>>"KMeans"-based algorithms, please do so.  But note that "protected"
>>>is only slightly better than "public" (from a maintenance POV, it isn't
>>>better, since we'll be required to maintain compatibility all the same).
>>>Perhaps the base classe(s) can be package-private...
>>
>> OK, package-private on "EmptyClusterStrategy" and related logic is fine.
>>
>>>
>>>> >
>>>> >with something along those lines:
>>>> >
>>>> >public class ClusterSet<T> {
>>>> >    private Set<T> data;
>>>> >    private List<List<T>> clusters;
>>>> >
>>>> >    void remove(T e) {
>>>> >        return data.remove(e);
>>>> >    }
>>>> >
>>>> >    public interface Visitor {
>>>> >        T visit(List<List<T>> list);
>>>> >    }
>>>> >}
>>>> >
>>>> >Key point is separation of concern (selection of element vs
>>>> >removal of element).
>>>>
>>>> I propose the ClusterSet should has more member like "distanceMeasure"
>>>>  to store origin clustering parameters,
>>>> or  to accelerate or support cluster finding or point finding.
>>>> ```java
>>>> public class ClusterSet<T> {
>>>>     private List<Cluster> clusters;
>>>>     private DistanceMeasure measure;
>>>>     ...
>>>>     public List<Cluster> closestTopN(Clusterable point);
>>>
>>>How about providing a comparator factory?
>>>
>>>public <T extends Clusterable> static
>>> Comparator<Cluster<T>> getComparator(final Clusterable p, final
>>>DistanceMeasure m) {
>>>    return new Comparator<>() {
>>>        public int compare(Cluster<T> c1, Cluster<T> c2) {
>>>            final double d1 = c1.distance(p, m); // "distance" method
>>>to be added to the interface (?).
>>>            final double d2 = c2.distance(p, m);
>>>            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
>>>        }
>>>    }
>>>}
>>
>> How do you think about the "distanceMeasure" to be a memeber of
>> "ClusterSet",
>
>I don't see why.  A priori "ClusterSet" represents groupings
>of data, and the distance function is a parameter that e.g.
>determines whether the grouping is good or bad.
>
>> In my opinion "ClusterSet" should contains all the parameters and result for
>> clustering and clusters.
>
>IIRC, Alex mentioned "ClusterResult" (?).  IIUC, this would
>indeed collect everything necessary to evaluate the quality
>of the clustering (i.e. probably also the "DistanceMeasure"
>instance used by the clustering algorithm).

My understanding is "ClusterSet" and "ClusterResult" is one thing.
Did I missed something?

>
>>>
>>>Then, if also providing an accessor to the clusters:
>>>
>>>public List<Cluster<T>> getClusters() {
>>>    return Collections.umodifiableList(clusters);
>>>}
>>>
>>>users can do whatever they want, a.o. easily implement the
>>>ranking method which you suggest:
>>>
>>>/** @return clusters in descending order of proximity. */
>>>public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
>>>DistanceMeasure m) {
>>>    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
>>>    Collections.sort(cList, set.getComparator(p, m));
>>>    return cList;
>>>}
>>>
>>>>
>>>>     public int pointSize();
>>>> }
>>>> ```
>>>
>>>A priori, I'd favour more encapsulation, in order to ensure that
>>>multi-thread access can be controlled, e.g. hide the "ClusterSet"
>>>data-structures, and provide access through identifiers that are
>>>atomically created when clusters are added to the instance, and
>>>so on...
>>>
>>>It would be great if you can come up with a concrete (and fully
>>>tested) implementation of "ClusterSet". ;-)
>>
>> I would like to try, but it is a big change to API, we need more time and
>> discussions.
>
>Do you have specific questions?
>Otherwise, there were already several ideas and hints for
>a more flexible and more efficient design to get going...
>
>Gilles
>
>>>
>>>> >
>>>> >Of course the devil is in the details (maintain consistency among
>>>> >the fields of "ClusterSet", ensure that "Visitor" implementations
>>>> >cannot modify the internal state, ...).
>>>> >
>>>> >> Unless I missed some point, I'd ask again that the API be reviewed
>>>> >> *before* implementing several features (such as those "extractors")
>>>> >> on top of something that does not look right.
>>>> >
>>>> >One perspective might be to try and come with a design for use in
>>>> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>>>> >
>>>> >Best regards,
>>>> >Gilles
>>>> >
>>>> >> >> [...]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

Gilles Sadowski-2
Hello.

2020-03-21 4:02 UTC+01:00, [hidden email] <[hidden email]>:

> Hi,
>
>>2020-03-21 2:59 UTC+01:00, [hidden email] <[hidden email]>:
>>> Hi,
>>>
>>>>Le ven. 20 mars 2020 à 04:47, [hidden email] <[hidden email]> a
>>>> écrit
>>>> :
>>>>>
>>>>> Hi,
>>>>>
>>>>> >Hello.
>>>>> >
>>>>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <[hidden email]> a
>>>>> > écrit :
>>>>> >>
>>>>> >> Hi.
>>>>> >>
>>>>> >> 2020-03-18 15:10 UTC+01:00, [hidden email] <[hidden email]>:
>>>>> >> > Hi,
>>>>> >> >     I have created a PR to show my aim:
>>>>> >> > https://github.com/apache/commons-math/pull/126/files
>>>>> >>
>>>>> >> Am I correct that the implementations of "ClustersPointExtractor"
>>>>> >> modify the argument of the "extract" method?
>>>>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>>>>> >> in a public API.
>>>>> >
>>>>> >To be clearer (hopefully): I'd contrast this
>>>>> > (incomplete/non-compilable
>>>>> >code):
>>>>> >
>>>>> >public class Extractor {
>>>>> >    T extract(List<List<T>> list);
>>>>> >}
>>>>>
>>>>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>>>>  and related logic only used in K-Means.
>>>>> And the "Extractor" remove a point from exists Clusters is indeed
>>>>> unsuitable
>>>>> to be a public API(as you mentioned before.)
>>>>> I think another choice is simple make "EmptyClusterStrategy" and
>>>>> related
>>>>> logic protected,
>>>>> then it can be resuse by MiniBatchKMeans.
>>>>>
>>>>> Also after the "CentroidInitializer" has been move out, the
>>>>> "KMeansPlusPlusClusterer"
>>>>>  should rename to "KMeansClusterer" with a construct parameter
>>>>> "CentroidInitializer"
>>>>
>>>>If you think that it makes sense to define a class hierarchy for
>>>>"KMeans"-based algorithms, please do so.  But note that "protected"
>>>>is only slightly better than "public" (from a maintenance POV, it isn't
>>>>better, since we'll be required to maintain compatibility all the same).
>>>>Perhaps the base classe(s) can be package-private...
>>>
>>> OK, package-private on "EmptyClusterStrategy" and related logic is fine.
>>>
>>>>
>>>>> >
>>>>> >with something along those lines:
>>>>> >
>>>>> >public class ClusterSet<T> {
>>>>> >    private Set<T> data;
>>>>> >    private List<List<T>> clusters;
>>>>> >
>>>>> >    void remove(T e) {
>>>>> >        return data.remove(e);
>>>>> >    }
>>>>> >
>>>>> >    public interface Visitor {
>>>>> >        T visit(List<List<T>> list);
>>>>> >    }
>>>>> >}
>>>>> >
>>>>> >Key point is separation of concern (selection of element vs
>>>>> >removal of element).
>>>>>
>>>>> I propose the ClusterSet should has more member like "distanceMeasure"
>>>>>  to store origin clustering parameters,
>>>>> or  to accelerate or support cluster finding or point finding.
>>>>> ```java
>>>>> public class ClusterSet<T> {
>>>>>     private List<Cluster> clusters;
>>>>>     private DistanceMeasure measure;
>>>>>     ...
>>>>>     public List<Cluster> closestTopN(Clusterable point);
>>>>
>>>>How about providing a comparator factory?
>>>>
>>>>public <T extends Clusterable> static
>>>> Comparator<Cluster<T>> getComparator(final Clusterable p, final
>>>>DistanceMeasure m) {
>>>>    return new Comparator<>() {
>>>>        public int compare(Cluster<T> c1, Cluster<T> c2) {
>>>>            final double d1 = c1.distance(p, m); // "distance" method
>>>>to be added to the interface (?).
>>>>            final double d2 = c2.distance(p, m);
>>>>            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
>>>>        }
>>>>    }
>>>>}
>>>
>>> How do you think about the "distanceMeasure" to be a memeber of
>>> "ClusterSet",
>>
>>I don't see why.  A priori "ClusterSet" represents groupings
>>of data, and the distance function is a parameter that e.g.
>>determines whether the grouping is good or bad.
>>
>>> In my opinion "ClusterSet" should contains all the parameters and result
>>> for
>>> clustering and clusters.
>>
>>IIRC, Alex mentioned "ClusterResult" (?).  IIUC, this would
>>indeed collect everything necessary to evaluate the quality
>>of the clustering (i.e. probably also the "DistanceMeasure"
>>instance used by the clustering algorithm).
>
> My understanding is "ClusterSet" and "ClusterResult" is one thing.

I don't think so.
IMO, the idea is that "ClusterSet" will hold data-structures
like "List<Cluster<Clusterable>>" together with methods
to manipulate individual clusters (in a thread-safe way).
While "ClusterResult" would be the return type of a call
to the "cluster(Collection<Clusterable>)" method of a
clustering algorithm implementation, and would contain
a "ClusterSet" instance together with any additional fields
deemed necessary to analyze the result.

> Did I missed something?

I hope that I didn't, and that the above paragraph dispelled
the confusion.

Regards,
Gilles

>>
>>>>
>>>>Then, if also providing an accessor to the clusters:
>>>>
>>>>public List<Cluster<T>> getClusters() {
>>>>    return Collections.umodifiableList(clusters);
>>>>}
>>>>
>>>>users can do whatever they want, a.o. easily implement the
>>>>ranking method which you suggest:
>>>>
>>>>/** @return clusters in descending order of proximity. */
>>>>public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
>>>>DistanceMeasure m) {
>>>>    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
>>>>    Collections.sort(cList, set.getComparator(p, m));
>>>>    return cList;
>>>>}
>>>>
>>>>>
>>>>>     public int pointSize();
>>>>> }
>>>>> ```
>>>>
>>>>A priori, I'd favour more encapsulation, in order to ensure that
>>>>multi-thread access can be controlled, e.g. hide the "ClusterSet"
>>>>data-structures, and provide access through identifiers that are
>>>>atomically created when clusters are added to the instance, and
>>>>so on...
>>>>
>>>>It would be great if you can come up with a concrete (and fully
>>>>tested) implementation of "ClusterSet". ;-)
>>>
>>> I would like to try, but it is a big change to API, we need more time and
>>> discussions.
>>
>>Do you have specific questions?
>>Otherwise, there were already several ideas and hints for
>>a more flexible and more efficient design to get going...
>>
>>Gilles
>>
>>>>
>>>>> >
>>>>> >Of course the devil is in the details (maintain consistency among
>>>>> >the fields of "ClusterSet", ensure that "Visitor" implementations
>>>>> >cannot modify the internal state, ...).
>>>>> >
>>>>> >> Unless I missed some point, I'd ask again that the API be reviewed
>>>>> >> *before* implementing several features (such as those "extractors")
>>>>> >> on top of something that does not look right.
>>>>> >
>>>>> >One perspective might be to try and come with a design for use in
>>>>> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>>>>> >
>>>>> >Best regards,
>>>>> >Gilles
>>>>> >
>>>>> >> >> [...]
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [hidden email]
>>For additional commands, e-mail: [hidden email]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]