Hello.
Any objection to commit the code as proposed on the report page? https://issues.apache.org/jira/browse/MATH-816 Regards, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Seems fine.
I think that the limitation to a fixed number of mixture components is a bit limiting. So is the limitation to a uniform set of components. Both limitations can be eased without a huge difficultly. Avoiding the fixed number of components can be done by using some variant of Dirichlet processes. Simply picking k_max relatively large and then using an approximate DP over that finite set works well. That said, mixture models are pretty nice to have. On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < [hidden email]> wrote: > Hello. > > Any objection to commit the code as proposed on the report page? > https://issues.apache.org/jira/browse/MATH-816 > > > Regards, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > |
Ted,
I am not sure I understand the problem with the fixed number of components. My understanding is that CM prefers immutable objects. Adding a component to an object would require reweighting in addition to modifying the component list. A new mixture model could be instantiated using the getComponents function and then adding or removing more components if necessary. Jared ________________________________________ From: Ted Dunning [[hidden email]] Sent: Wednesday, October 17, 2012 5:21 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? = Seems fine. I think that the limitation to a fixed number of mixture components is a bit limiting. So is the limitation to a uniform set of components. Both limitations can be eased without a huge difficultly. Avoiding the fixed number of components can be done by using some variant of Dirichlet processes. Simply picking k_max relatively large and then using an approximate DP over that finite set works well. That said, mixture models are pretty nice to have. On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < [hidden email]> wrote: > Hello. > > Any objection to commit the code as proposed on the report page? > https://issues.apache.org/jira/browse/MATH-816 > > > Regards, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
The issue is that with a fixed number of components, you need to do
multiple runs to find a best fit number of components. Gibbs sampling against a Dirichlet process can get you to the same answer in about the same cost as a single run of EM with a fixed number of models. On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared < [hidden email]> wrote: > Ted, > > I am not sure I understand the problem with the fixed number of > components. My understanding is that CM prefers immutable objects. Adding > a component to an object would require reweighting in addition to modifying > the component list. A new mixture model could be instantiated using the > getComponents function and then adding or removing more components if > necessary. > > Jared > ________________________________________ > From: Ted Dunning [[hidden email]] > Sent: Wednesday, October 17, 2012 5:21 PM > To: Commons Developers List > Subject: Re: [Math] MATH-816 (mixture model > distribution)=?utf-8?B?LiAgICAu? = > > Seems fine. > > I think that the limitation to a fixed number of mixture components is a > bit limiting. So is the limitation to a uniform set of components. Both > limitations can be eased without a huge difficultly. > > Avoiding the fixed number of components can be done by using some variant > of Dirichlet processes. Simply picking k_max relatively large and then > using an approximate DP over that finite set works well. > > That said, mixture models are pretty nice to have. > > On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < > [hidden email]> wrote: > > > Hello. > > > > Any objection to commit the code as proposed on the report page? > > https://issues.apache.org/jira/browse/MATH-816 > > > > > > Regards, > > Gilles > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > > > Email Disclaimer: www.stjude.org/emaildisclaimer > Consultation Disclaimer: www.stjude.org/consultationdisclaimer > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > |
I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine.
Jared ________________________________________ From: Ted Dunning [[hidden email]] Sent: Wednesday, October 17, 2012 9:41 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution) =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?= The issue is that with a fixed number of components, you need to do multiple runs to find a best fit number of components. Gibbs sampling against a Dirichlet process can get you to the same answer in about the same cost as a single run of EM with a fixed number of models. On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared < [hidden email]> wrote: > Ted, > > I am not sure I understand the problem with the fixed number of > components. My understanding is that CM prefers immutable objects. Adding > a component to an object would require reweighting in addition to modifying > the component list. A new mixture model could be instantiated using the > getComponents function and then adding or removing more components if > necessary. > > Jared > ________________________________________ > From: Ted Dunning [[hidden email]] > Sent: Wednesday, October 17, 2012 5:21 PM > To: Commons Developers List > Subject: Re: [Math] MATH-816 (mixture model > distribution)=?utf-8?B?LiAgICAu? = > > Seems fine. > > I think that the limitation to a fixed number of mixture components is a > bit limiting. So is the limitation to a uniform set of components. Both > limitations can be eased without a huge difficultly. > > Avoiding the fixed number of components can be done by using some variant > of Dirichlet processes. Simply picking k_max relatively large and then > using an approximate DP over that finite set works well. > > That said, mixture models are pretty nice to have. > > On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < > [hidden email]> wrote: > > > Hello. > > > > Any objection to commit the code as proposed on the report page? > > https://issues.apache.org/jira/browse/MATH-816 > > > > > > Regards, > > Gilles > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > > > Email Disclaimer: www.stjude.org/emaildisclaimer > Consultation Disclaimer: www.stjude.org/consultationdisclaimer > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. I like the interface as implemented for what it represents, but I agree with Ted's point above. I also wonder if implementing the multivariate distribution interface is really buying you anything. Certainly not for the Gibbs sampler. It might be better to just directly implement EM with an interface that is natural for fitting and using mixture models. I am not sure this stuff belongs in the distribution package in any case. Where were we intending to place the EM fit? Can you describe a little more how exactly the practical use cases you have in mind will work? Phil > > Jared > ________________________________________ > From: Ted Dunning [[hidden email]] > Sent: Wednesday, October 17, 2012 9:41 PM > To: Commons Developers List > Subject: Re: [Math] MATH-816 (mixture model distribution) =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?= > > The issue is that with a fixed number of components, you need to do > multiple runs to find a best fit number of components. Gibbs sampling > against a Dirichlet process can get you to the same answer in about the > same cost as a single run of EM with a fixed number of models. > > On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared < > [hidden email]> wrote: > >> Ted, >> >> I am not sure I understand the problem with the fixed number of >> components. My understanding is that CM prefers immutable objects. Adding >> a component to an object would require reweighting in addition to modifying >> the component list. A new mixture model could be instantiated using the >> getComponents function and then adding or removing more components if >> necessary. >> >> Jared >> ________________________________________ >> From: Ted Dunning [[hidden email]] >> Sent: Wednesday, October 17, 2012 5:21 PM >> To: Commons Developers List >> Subject: Re: [Math] MATH-816 (mixture model >> distribution)=?utf-8?B?LiAgICAu? = >> >> Seems fine. >> >> I think that the limitation to a fixed number of mixture components is a >> bit limiting. So is the limitation to a uniform set of components. Both >> limitations can be eased without a huge difficultly. >> >> Avoiding the fixed number of components can be done by using some variant >> of Dirichlet processes. Simply picking k_max relatively large and then >> using an approximate DP over that finite set works well. >> >> That said, mixture models are pretty nice to have. >> >> On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < >> [hidden email]> wrote: >> >>> Hello. >>> >>> Any objection to commit the code as proposed on the report page? >>> https://issues.apache.org/jira/browse/MATH-816 >>> >>> >>> Regards, >>> Gilles >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [hidden email] >>> For additional commands, e-mail: [hidden email] >>> >>> >> Email Disclaimer: www.stjude.org/emaildisclaimer >> Consultation Disclaimer: www.stjude.org/consultationdisclaimer >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [hidden email] >> For additional commands, e-mail: [hidden email] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
> On 10/17/12 8:36 PM, Becksfort, Jared wrote: > > I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. > > I like the interface as implemented for what it represents, By "interface", do you mean the class MixtureMultivariateRealDistribution" as implemented in the file on the JIRA page? > but I > agree with Ted's point above. I also wonder if implementing the > multivariate distribution interface is really buying you anything. > Certainly not for the Gibbs sampler. It might be better to just > directly implement EM with an interface that is natural for fitting > and using mixture models. I am not sure this stuff belongs in the > distribution package in any case. As implemented, it seems quite natural. How this class will be used by non-existing code is beyond the scope of MATH-816. [And when the code exists, we can always revisit the design if necessary.] > Where were we intending to place > the EM fit? Not in distribution; I agree. > Can you describe a little more how exactly the > practical use cases you have in mind will work? This will probably be for a new JIRA feature request. Regards, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
In reply to this post by Becksfort, Jared
I vote for simplicity. Current practice in the social sciences is to fit
multiple models, each with a different number of components, and use fit statistics to choose the best model. There are some additional features I would like to see added and I have the code to contribute if it is not currently there. To be consistent with Mplus, we need have the algorithm use multiple random starts and run a few of the best starts to completion. Mplus uses this strategy to effectively overcome local minima. -----Original Message----- From: Becksfort, Jared [mailto:[hidden email]] Sent: Wednesday, October 17, 2012 11:37 PM To: Commons Developers List Subject: RE: [Math] MATH-816 (mixture model distribution) . . I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. Jared ________________________________________ From: Ted Dunning [[hidden email]] Sent: Wednesday, October 17, 2012 9:41 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution) =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?= The issue is that with a fixed number of components, you need to do multiple runs to find a best fit number of components. Gibbs sampling against a Dirichlet process can get you to the same answer in about the same cost as a single run of EM with a fixed number of models. On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared < [hidden email]> wrote: > Ted, > > I am not sure I understand the problem with the fixed number of > components. My understanding is that CM prefers immutable objects. > Adding a component to an object would require reweighting in addition > to modifying the component list. A new mixture model could be > instantiated using the getComponents function and then adding or > removing more components if necessary. > > Jared > ________________________________________ > From: Ted Dunning [[hidden email]] > Sent: Wednesday, October 17, 2012 5:21 PM > To: Commons Developers List > Subject: Re: [Math] MATH-816 (mixture model > distribution)=?utf-8?B?LiAgICAu? = > > Seems fine. > > I think that the limitation to a fixed number of mixture components is > a bit limiting. So is the limitation to a uniform set of components. > Both limitations can be eased without a huge difficultly. > > Avoiding the fixed number of components can be done by using some > variant of Dirichlet processes. Simply picking k_max relatively large > and then using an approximate DP over that finite set works well. > > That said, mixture models are pretty nice to have. > > On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < > [hidden email]> wrote: > > > Hello. > > > > Any objection to commit the code as proposed on the report page? > > https://issues.apache.org/jira/browse/MATH-816 > > > > > > Regards, > > Gilles > > > > -------------------------------------------------------------------- > > - To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > > > Email Disclaimer: www.stjude.org/emaildisclaimer Consultation > Disclaimer: www.stjude.org/consultationdisclaimer > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On Thu, Oct 18, 2012 at 08:13:52AM -0400, Patrick Meyer wrote:
> I vote for simplicity. Current practice in the social sciences is to fit > multiple models, each with a different number of components, and use fit > statistics to choose the best model. So... Do you vote for the current proposal (as in the latest attachment on the JIRA page)? [Sorry for being dense. :-)] [The (simple) "mixture model" code could be in 3.1, due to be out a couple of weeks _ago_. :-}] > > There are some additional features I would like to see added and I have the > code to contribute if it is not currently there. To be consistent with > Mplus, we need have the algorithm use multiple random starts and run a few > of the best starts to completion. Mplus uses this strategy to effectively > overcome local minima. Proposals welcome; please open a feature request with an outline of the implementation. Thanks, Gilles > [...] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
In reply to this post by Gilles Sadowski
On 10/18/12 1:41 AM, Gilles Sadowski wrote:
> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote: >> On 10/17/12 8:36 PM, Becksfort, Jared wrote: >>> I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. >> I like the interface as implemented for what it represents, > By "interface", do you mean the class > MixtureMultivariateRealDistribution" > as implemented in the file on the JIRA page? Yes, the most recent one. I like the way you set up the constructors, handling the weights and distribution type parameter. > >> but I >> agree with Ted's point above. I also wonder if implementing the >> multivariate distribution interface is really buying you anything. >> Certainly not for the Gibbs sampler. It might be better to just >> directly implement EM with an interface that is natural for fitting >> and using mixture models. I am not sure this stuff belongs in the >> distribution package in any case. > As implemented, it seems quite natural. > How this class will be used by non-existing code is beyond the scope of > MATH-816. > [And when the code exists, we can always revisit the design if necessary.] It works for fixed component models, which I guess is OK by consensus to start. The question I was asking is what exactly do you get by having it extend the multivariate real distribution? I guess that will become clear when we get the EM implementation. I am OK committing this, I just wanted to get a clearer picture of how the class was going to be used. Phil > >> Where were we intending to place >> the EM fit? > Not in distribution; I agree. > >> Can you describe a little more how exactly the >> practical use cases you have in mind will work? > This will probably be for a new JIRA feature request. > > > Regards, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
In reply to this post by Gilles Sadowski
Yes, I like the latest changes. It looks cleaner to me.
It seems that the attachments only describe the mixture distribution and do not provide the EM estimation algorithm. Am I missing something? Didn't the original contributor mention the estimation part too? The parts I would like to add will need the EM part first. Thanks! -----Original Message----- From: Gilles Sadowski [mailto:[hidden email]] Sent: Thursday, October 18, 2012 9:33 AM To: [hidden email] Subject: Re: [Math] MATH-816 (mixture model distribution) On Thu, Oct 18, 2012 at 08:13:52AM -0400, Patrick Meyer wrote: > I vote for simplicity. Current practice in the social sciences is to > fit multiple models, each with a different number of components, and > use fit statistics to choose the best model. So... Do you vote for the current proposal (as in the latest attachment on the JIRA page)? [Sorry for being dense. :-)] [The (simple) "mixture model" code could be in 3.1, due to be out a couple of weeks _ago_. :-}] > > There are some additional features I would like to see added and I > have the code to contribute if it is not currently there. To be > consistent with Mplus, we need have the algorithm use multiple random > starts and run a few of the best starts to completion. Mplus uses this > strategy to effectively overcome local minima. Proposals welcome; please open a feature request with an outline of the implementation. Thanks, Gilles > [...] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On Thu, Oct 18, 2012 at 11:00:17AM -0400, Patrick Meyer wrote:
> Yes, I like the latest changes. It looks cleaner to me. > > It seems that the attachments only describe the mixture distribution and do > not provide the EM estimation algorithm. Am I missing something? Didn't the > original contributor mention the estimation part too? The parts I would like > to add will need the EM part first. Jared has broken up his proposal into three parts; the enclosing issue is https://issues.apache.org/jira/browse/MATH-817 which explicitly refers to the EM part, but no implementation of this has been submitted yet. Regards, Gilles > [...] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
In reply to this post by Phil Steitz
On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
> On 10/18/12 1:41 AM, Gilles Sadowski wrote: > > On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote: > >> On 10/17/12 8:36 PM, Becksfort, Jared wrote: > >>> I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. > >> I like the interface as implemented for what it represents, > > By "interface", do you mean the class > > MixtureMultivariateRealDistribution" > > as implemented in the file on the JIRA page? > > Yes, the most recent one. I like the way you set up the > constructors, handling the weights and distribution type parameter. > > > >> but I > >> agree with Ted's point above. I also wonder if implementing the > >> multivariate distribution interface is really buying you anything. > >> Certainly not for the Gibbs sampler. It might be better to just > >> directly implement EM with an interface that is natural for fitting > >> and using mixture models. I am not sure this stuff belongs in the > >> distribution package in any case. > > As implemented, it seems quite natural. > > How this class will be used by non-existing code is beyond the scope of > > MATH-816. > > [And when the code exists, we can always revisit the design if necessary.] > > It works for fixed component models, which I guess is OK by > consensus to start. The question I was asking is what exactly do you > get by having it extend the multivariate real distribution? Is it not a kind of distribution? [It's obvious that one can sample from it but maybe there are some required properties (for a distribution) which are missing from such a mixture (?).] > I guess > that will become clear when we get the EM implementation. Hopefully. > I am OK committing this, I just wanted to get a clearer picture of > how the class was going to be used. I wouldn't be able to answer. Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
In reply to this post by Patrick Meyer
Existing code does have a certain cachet to it.
On Thu, Oct 18, 2012 at 5:13 AM, Patrick Meyer <[hidden email]> wrote: > I vote for simplicity. Current practice in the social sciences is to fit > multiple models, each with a different number of components, and use fit > statistics to choose the best model. > > There are some additional features I would like to see added and I have the > code to contribute if it is not currently there. To be consistent with > Mplus, we need have the algorithm use multiple random starts and run a few > of the best starts to completion. Mplus uses this strategy to effectively > overcome local minima. > > > -----Original Message----- > From: Becksfort, Jared [mailto:[hidden email]] > Sent: Wednesday, October 17, 2012 11:37 PM > To: Commons Developers List > Subject: RE: [Math] MATH-816 (mixture model distribution) . . > > I see. I am planning to submit the EM fit for multivariate normal mixture > models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may > be a bit further out. I am not opposed to allowing the number of > components to change, but I also like the simplicity of this class. > Whatever you guys decide is probably fine. > > Jared > ________________________________________ > From: Ted Dunning [[hidden email]] > Sent: Wednesday, October 17, 2012 9:41 PM > To: Commons Developers List > Subject: Re: [Math] MATH-816 (mixture model distribution) > =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?= > > The issue is that with a fixed number of components, you need to do > multiple > runs to find a best fit number of components. Gibbs sampling against a > Dirichlet process can get you to the same answer in about the same cost as > a > single run of EM with a fixed number of models. > > On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared < > [hidden email]> wrote: > > > Ted, > > > > I am not sure I understand the problem with the fixed number of > > components. My understanding is that CM prefers immutable objects. > > Adding a component to an object would require reweighting in addition > > to modifying the component list. A new mixture model could be > > instantiated using the getComponents function and then adding or > > removing more components if necessary. > > > > Jared > > ________________________________________ > > From: Ted Dunning [[hidden email]] > > Sent: Wednesday, October 17, 2012 5:21 PM > > To: Commons Developers List > > Subject: Re: [Math] MATH-816 (mixture model > > distribution)=?utf-8?B?LiAgICAu? = > > > > Seems fine. > > > > I think that the limitation to a fixed number of mixture components is > > a bit limiting. So is the limitation to a uniform set of components. > > Both limitations can be eased without a huge difficultly. > > > > Avoiding the fixed number of components can be done by using some > > variant of Dirichlet processes. Simply picking k_max relatively large > > and then using an approximate DP over that finite set works well. > > > > That said, mixture models are pretty nice to have. > > > > On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < > > [hidden email]> wrote: > > > > > Hello. > > > > > > Any objection to commit the code as proposed on the report page? > > > https://issues.apache.org/jira/browse/MATH-816 > > > > > > > > > Regards, > > > Gilles > > > > > > -------------------------------------------------------------------- > > > - To unsubscribe, e-mail: [hidden email] > > > For additional commands, e-mail: [hidden email] > > > > > > > > > > Email Disclaimer: www.stjude.org/emaildisclaimer Consultation > > Disclaimer: www.stjude.org/consultationdisclaimer > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > |
In reply to this post by Gilles Sadowski
On 10/18/12 8:55 AM, Gilles Sadowski wrote:
> On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote: >> On 10/18/12 1:41 AM, Gilles Sadowski wrote: >>> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote: >>>> On 10/17/12 8:36 PM, Becksfort, Jared wrote: >>>>> I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. >>>> I like the interface as implemented for what it represents, >>> By "interface", do you mean the class >>> MixtureMultivariateRealDistribution" >>> as implemented in the file on the JIRA page? >> Yes, the most recent one. I like the way you set up the >> constructors, handling the weights and distribution type parameter. >>>> but I >>>> agree with Ted's point above. I also wonder if implementing the >>>> multivariate distribution interface is really buying you anything. >>>> Certainly not for the Gibbs sampler. It might be better to just >>>> directly implement EM with an interface that is natural for fitting >>>> and using mixture models. I am not sure this stuff belongs in the >>>> distribution package in any case. >>> As implemented, it seems quite natural. >>> How this class will be used by non-existing code is beyond the scope of >>> MATH-816. >>> [And when the code exists, we can always revisit the design if necessary.] >> It works for fixed component models, which I guess is OK by >> consensus to start. The question I was asking is what exactly do you >> get by having it extend the multivariate real distribution? > Is it not a kind of distribution? > [It's obvious that one can sample from it but maybe there are some required > properties (for a distribution) which are missing from such a mixture (?).] What is implemented is a legitimate distribution (or more precisely, a legitimate density, which is all we really model in RealMultivariateDistribution). I just wonder whether there is value in it as a distribution per se, rather than just a container for the weights and component distribution parameters. The sample() implementation is legitimate - I just don't know if it has any practical value. I guess the density will be used by the EM impl. As I said above, I am fine committing and then seeing how the EM impl uses the class. Assuming it does turn out to be practically valuable as a distribution, a natural thing to add would be a univariate version; but that would require an actual distribution function. Phil > >> I guess >> that will become clear when we get the EM implementation. > Hopefully. > >> I am OK committing this, I just wanted to get a clearer picture of >> how the class was going to be used. > I wouldn't be able to answer. > > > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Typing this on my phone. Sorry about format. The sampling part of the mixture model makes it a true distribution according to the abstract class and interface. It will also come in handy for simulating data, I think. I am already using it to simulate mri images. _ I support adding get dimension function to interface. I can't do anything for a few days though. _______________________________________ From: Phil Steitz [[hidden email]] Sent: Thursday, October 18, 2012 2:50 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? = On 10/18/12 8:55 AM, Gilles Sadowski wrote: > On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote: >> On 10/18/12 1:41 AM, Gilles Sadowski wrote: >>> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote: >>>> On 10/17/12 8:36 PM, Becksfort, Jared wrote: >>>>> I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. >>>> I like the interface as implemented for what it represents, >>> By "interface", do you mean the class >>> MixtureMultivariateRealDistribution" >>> as implemented in the file on the JIRA page? >> Yes, the most recent one. I like the way you set up the >> constructors, handling the weights and distribution type parameter. >>>> but I >>>> agree with Ted's point above. I also wonder if implementing the >>>> multivariate distribution interface is really buying you anything. >>>> Certainly not for the Gibbs sampler. It might be better to just >>>> directly implement EM with an interface that is natural for fitting >>>> and using mixture models. I am not sure this stuff belongs in the >>>> distribution package in any case. >>> As implemented, it seems quite natural. >>> How this class will be used by non-existing code is beyond the scope of >>> MATH-816. >>> [And when the code exists, we can always revisit the design if necessary.] >> It works for fixed component models, which I guess is OK by >> consensus to start. The question I was asking is what exactly do you >> get by having it extend the multivariate real distribution? > Is it not a kind of distribution? > [It's obvious that one can sample from it but maybe there are some required > properties (for a distribution) which are missing from such a mixture (?).] What is implemented is a legitimate distribution (or more precisely, a legitimate density, which is all we really model in RealMultivariateDistribution). I just wonder whether there is value in it as a distribution per se, rather than just a container for the weights and component distribution parameters. The sample() implementation is legitimate - I just don't know if it has any practical value. I guess the density will be used by the EM impl. As I said above, I am fine committing and then seeing how the EM impl uses the class. Assuming it does turn out to be practically valuable as a distribution, a natural thing to add would be a univariate version; but that would require an actual distribution function. Phil > >> I guess >> that will become clear when we get the EM implementation. > Hopefully. > >> I am OK committing this, I just wanted to get a clearer picture of >> how the class was going to be used. > I wouldn't be able to answer. > > > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Free forum by Nabble | Edit this page |