[ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Benedikt Ritter-4
Hi all,

I just want to let you know, that I've joined the discussion, the github
commons rdf community is currently having at github [3]. I think it is time
for the PMC to take action here since it feels like there is a conflict in
the beginning.

Hello Commons RDF community,

first of all, I'm speaking for myself an not on behalf of the Apache
Commons PMC.

I'm really confused by this whole situation. This is what happend from my
POV: the Commons RDF community at one point had the idea of moving to
Apache Commons (which in my eyes made sense, given that fact the Apache
Commons is a place to share code between Apache projects). You really began
pushing things, you even requested a git mirror on behalf of the Apache
Commons project from Infra, which now is unused [1]!
Then Commons RDF decided that it didn't what to join Apache Commons
anymore, which was okay (at least for me).

Later Reto showed up and wanted to try things out in the Apache Commons
Sandbox. This is perfectly okay for us. Every Apache Committer may start
new ideas in our sandbox (in fact we lately granted commit access to all of
our repositories to all ASF committers [2]). However to actually grow out
of the sandbox and become a proper component, there has to be a community
around said component. At the moment, I don't see such a community around
the Apache Commons Sandbox RDF component. But who knows, maybe there will
be such a community one day? Maybe not. We do not force things. We just let
people work with the code (inside the sandbox) the way they like. The is no
threat to this component at all. We don't have an evil plan to destroy
Commons RDF.
The differences regarding how to implement the RDF spec is not of our
business. None of the current Apache Commons team know RDF. Who are we to
judge which approach is the right one?

I'm copying this message to the Apache Commons mailing list, so that
everybody is up to date. If you want to respond, please also copy your
response to the dev ML. If the Common ML is to noisy: we're using prefixes
on the ML. You just have to define a filter that delete all mail which do
not start with [RDF].

I hope we can settle this issue once and for all. Right now it feels like
"Apache Commons are the bad guys" and I don't think we deserve this.

Regards,
Benedikt

[1] https://issues.apache.org/jira/browse/INFRA-8068
[2] http://markmail.org/message/q5slpso253joca7n


[3] https://github.com/commons-rdf/commons-rdf/issues/43

--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Sergio Fernández
Hi Benedikt,

On 15/01/15 09:40, Benedikt Ritter wrote:
> I just want to let you know, that I've joined the discussion, the github
> commons rdf community is currently having at github [3]. I think it is time
> for the PMC to take action here since it feels like there is a conflict in
> the beginning.

OK, although a bit too early, I'm fine jumping into [hidden email] to
discuss this in the apache way.

First I'd like to apologize with the Apache Commons community, because I
wanted to keep this conflict out until we could have a solution, which
honestly we do not have yet beyond a proposal under discussion:

https://github.com/commons-rdf/commons-rdf/issues/43#issuecomment-69916423

> I'm really confused by this whole situation. This is what happend from my
> POV: the Commons RDF community at one point had the idea of moving to
> Apache Commons (which in my eyes made sense, given that fact the Apache
> Commons is a place to share code between Apache projects). You really began
> pushing things, you even requested a git mirror on behalf of the Apache
> Commons project from Infra, which now is unused [1]!
> Then Commons RDF decided that it didn't what to join Apache Commons
> anymore, which was okay (at least for me).

Since I was the person who push for that step, I fell I need to properly
explain it.

I think at that stage we had three issues: The first one was about git,
and how the tool was used for agreements on the design. Second, the
single mailing lists was understood as a barrier for communication. And
the third, Commons RDF was not yet providing an implementation.

OK, the first one was easy to solve; even if we may loose the nice
github interfaces, we can keep the workflow based on PRs, that's fine.
The single mailing list was in fact seen as a kind of problem; on the
one hand, getting so much noise, but on the other hand also generating
noise irrelevant for another projects. And about the third one, once we
got established the API we are in a much better position to provide
basic implementations. And that's why temporally decided to go back to
github.

> Later Reto showed up and wanted to try things out in the Apache Commons
> Sandbox. This is perfectly okay for us. Every Apache Committer may start
> new ideas in our sandbox (in fact we lately granted commit access to all of
> our repositories to all ASF committers [2]). However to actually grow out
> of the sandbox and become a proper component, there has to be a community
> around said component. At the moment, I don't see such a community around
> the Apache Commons Sandbox RDF component. But who knows, maybe there will
> be such a community one day? Maybe not. We do not force things. We just let
> people work with the code (inside the sandbox) the way they like. The is no
> threat to this component at all. We don't have an evil plan to destroy
> Commons RDF.
> The differences regarding how to implement the RDF spec is not of our
> business. None of the current Apache Commons team know RDF. Who are we to
> judge which approach is the right one?

We started Commons RDF with the vision of aligning, and allowing
portability, across the two major and already established RDF libraries
(Apache Jena and OpenRDF Sesame). I neither have nothing to say how each
library interpreted and implemented the RDF specification. But I know
quite well the troubles that that duality causes even for basic things.
Therefore we started a trip together those tow project (Andy and Peter
and traveling with us) for designed a basic API that can be considered
"common". And that's what we have now at github.

I'm not against other implementations, more basic or bound to concrete
use cases, that's good. But I think just yet-another API would not help.
And here where we come closer to the point of conflict: the current code
at Apache Commons Sandbox RDF proposes a new API as a Commons bound to
an existing implementation (Clerezza) with a very low adoration in the
developers community, forgetting the background and ignoring it makes
the incompatibility issue even bigger.

Therefore my proposal for Commons RDF is the following:

* Commons RDF proposes an API that addresses portability issues. I'd
recommend to start form what we currently have at github which was
actually designed by committee and both Jena and Sesame already started
to implement.
* We evolve the current design in the context of Apache Commons Sandbox
* We keep separated the API from the implementations:
* We keep clear the point that the major established RDF Toolkits
(Apache Jena and OpenRDF Sesame) are the recommended implementations
* We make an open call for contributing basic implementations to the
project. We can adopt the one provided by Stian, and also work with Reto
to move the Clerezza-based implementation (aka Apache Commons Sandbox
RDF) to that API (what seems to be what he is willing to do anyway). The
feedback from those implementations would be consider for evolving the API.

We can easily organize in different Maven artifacts if we all agree on
this setup.

> I hope we can settle this issue once and for all. Right now it feels like
> "Apache Commons are the bad guys" and I don't think we deserve this.

I think we never said that, and I personally do not have that feeling.
We are people with experience in Apache, and we do respect each project,
specially one as good as Apache Commons.

I just want to ask about the option of having a dedicated dev mailing
list, keeping the general style for announcements or things relevant for
the whole project

I really believe we can arrive somewhere.

Thanks for bring this discussion, Benedikt.

Cheers,

--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: [hidden email]
w: http://redlink.co

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Bruno P. Kinoshita
In reply to this post by Benedikt Ritter-4
Hello!


I feel like I can't help much in the current discussion. But just wanted to chime in
and tell that I'm +1 for a [rdf] component in Apache Commons. As a commons committer I'd
like to help.

I started watching the GitHub repository and have subscribed to the ongoing discussion. I'll

tryto contribute in some way; maybe testing and with small patches.


My go-to Maven dependency for RDF, Turtle, N3, working with ontologies, reasoners, etc,

is Apache Jena. I think it would be very positive to have a common interface that I could
use in my code (mainly crawlers and data munging for Hadoop jobs) and that would work

with different implementations.


Thanks!

Bruno

>________________________________
> From: Benedikt Ritter <[hidden email]>
>To: Commons Developers List <[hidden email]>
>Sent: Thursday, January 15, 2015 6:40 AM
>Subject: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
>
>
>Hi all,
>
>I just want to let you know, that I've joined the discussion, the github
>commons rdf community is currently having at github [3]. I think it is time
>for the PMC to take action here since it feels like there is a conflict in
>the beginning.
>
>Hello Commons RDF community,
>
>first of all, I'm speaking for myself an not on behalf of the Apache
>Commons PMC.
>
>I'm really confused by this whole situation. This is what happend from my
>POV: the Commons RDF community at one point had the idea of moving to
>Apache Commons (which in my eyes made sense, given that fact the Apache
>Commons is a place to share code between Apache projects). You really began
>pushing things, you even requested a git mirror on behalf of the Apache
>Commons project from Infra, which now is unused [1]!
>Then Commons RDF decided that it didn't what to join Apache Commons
>anymore, which was okay (at least for me).
>
>Later Reto showed up and wanted to try things out in the Apache Commons
>Sandbox. This is perfectly okay for us. Every Apache Committer may start
>new ideas in our sandbox (in fact we lately granted commit access to all of
>our repositories to all ASF committers [2]). However to actually grow out
>of the sandbox and become a proper component, there has to be a community
>around said component. At the moment, I don't see such a community around
>the Apache Commons Sandbox RDF component. But who knows, maybe there will
>be such a community one day? Maybe not. We do not force things. We just let
>people work with the code (inside the sandbox) the way they like. The is no
>threat to this component at all. We don't have an evil plan to destroy
>Commons RDF.
>The differences regarding how to implement the RDF spec is not of our
>business. None of the current Apache Commons team know RDF. Who are we to
>judge which approach is the right one?
>
>I'm copying this message to the Apache Commons mailing list, so that
>everybody is up to date. If you want to respond, please also copy your
>response to the dev ML. If the Common ML is to noisy: we're using prefixes
>on the ML. You just have to define a filter that delete all mail which do
>not start with [RDF].
>
>I hope we can settle this issue once and for all. Right now it feels like
>"Apache Commons are the bad guys" and I don't think we deserve this.
>
>Regards,
>Benedikt
>
>[1] https://issues.apache.org/jira/browse/INFRA-8068
>[2] http://markmail.org/message/q5slpso253joca7n
>
>
>[3] https://github.com/commons-rdf/commons-rdf/issues/43
>
>--
>http://people.apache.org/~britter/
>http://www.systemoutprintln.de/
>http://twitter.com/BenediktRitter
>http://github.com/britter
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Peter Ansell
In reply to this post by Benedikt Ritter-4
The Clerezza team were all notified about the effort to put a common
RDF API together on GitHub and they responded positively at that
point. The only sticking point then and now IMO is the purely academic
distinction of opening up internal labels for blank nodes versus not
opening it up at all. Reto is against having the API allow access to
the identifiers on academic grounds, where other systems pragmatically
allow it with heavily worded javadoc contracts about their limited
usefulness, per the RDF specifications:

https://mail-archives.apache.org/mod_mbox/clerezza-dev/201406.mbox/%3C5398B07C.5000507@...%3E

However, for some more background we could refer back to discussion
about restructuring both Clerezza and Stanbol to make them more
maintainable and useful to the community:

https://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCAA7LAO2X++Uk8PoNM+b9=f9v2DN5zdZLjh2BjE0MmrZCYafnZQ@...%3E

In particular, as Rupert Westenthaler mentions there, the goal to
simply promote the Clerezza RDF model as commons.rdf would not achieve
much given the share that Jena and Sesame have.

The Commons RDF effort that Sergio has brokered, including Andy (Jena)
and I (Sesame), and including both Scala (w3c/banana-rdf@github) and
Clojure (drlivingston/kr@github) project representatives will provide
the common JVM RDF API that Rupert referred to as being necessary.

The main points as I see it that are necessary before starting the
process that was aborted last time (echoing Sergio's comments):

* Mailing list clutter: both in terms of the wide range of technical
discussions from commons rdf, and general email traffic from other
commons sub-projects discouraging potential participants from joining
in the discussion.
* Being able to use GitHub pull requests for code review, including if
necessary the sending of comments there to the apache mailing list
that is decided to be used for that purpose. The actual merging will
be done by hand in this case, but the code review features there are
too useful. The patching of PR comments back to apache mailing lists
has already done, so there is no technical issue for this, just
deciding which mailing list the comments will go to.
* Having it okay that the commons rdf api is a project that
principally aims to create a set of interfaces, and not host any of
the scalable implementations of the API. Stian Soiland-Reyes has
written a basic implementation, but in practice, any large dataset
will not load into that implementation and be queried efficiently, so
it is only going to be used for small in-memory tasks.

I hope there is no bad blood from the aborted effort last time. There
were a variety of causes, including the reasons above but we all
joined the GitHub discussion with the goal of hosting the project
inside of the Apache Foundation and IMO Apache Commons is still likely
the best way to do that for our small (in terms of code) project.

Cheers,

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Andy Seaborne
On 16/01/15 00:29, Peter Ansell wrote:
> I hope there is no bad blood from the aborted effort last time. There
> were a variety of causes, including the reasons above but we all
> joined the GitHub discussion with the goal of hosting the project
> inside of the Apache Foundation and IMO Apache Commons is still likely
> the best way to do that for our small (in terms of code) project.

+1

        Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Andy Seaborne
In reply to this post by Bruno P. Kinoshita
On 15/01/15 11:52, Bruno P. Kinoshita wrote:

> Hello!
>
>
> I feel like I can't help much in the current discussion. But just wanted to chime in
> and tell that I'm +1 for a [rdf] component in Apache Commons. As a commons committer I'd
> like to help.
>
> I started watching the GitHub repository and have subscribed to the ongoing discussion. I'll
>
> tryto contribute in some way; maybe testing and with small patches.
>
>
> My go-to Maven dependency for RDF, Turtle, N3, working with ontologies, reasoners, etc,
>
> is Apache Jena. I think it would be very positive to have a common interface that I could
> use in my code (mainly crawlers and data munging for Hadoop jobs) and that would work
>
> with different implementations.
>
>
> Thanks!
>
> Bruno

Since you mention Jena ... :-)

Jena can (and does) support multiple APIs over a common core.

A commons-rdf API can be added along side the existing APIs; that means
it is not a "big bang" to have commons-rdf interfaces supported.

There is a lot more to working with RDF than the RDF API part - SPARQL
engines don't use that API if they want performance and/or scale. (1)
SPARQL queries collections of graphs and (2) for scale+persistence, you
need to work in parts at a level somewhat lower level than java objects,
and closer to the binary of persistence structures.

        Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Bruno P. Kinoshita
Hi Andy!

> Jena can (and does) support multiple APIs over a common core.
>
> A commons-rdf API can be added along side the existing APIs; that means
> it is not a "big bang" to have commons-rdf interfaces supported.

That's great! Would the commons-rdf dependency go in jena-core/pom.xml? Is it going to be necessary to change some classes in the core? I think it will be transparent for other modules like ARQ, Fuseki, Text. Is that right?

> There is a lot more to working with RDF than the RDF API part - SPARQL
> engines don't use that API if they want performance and/or scale. (1)
> SPARQL queries collections of graphs and (2) for scale+persistence, you
> need to work in parts at a level somewhat lower level than java objects,
> and closer to the binary of persistence structures.

Good point. I'm enjoying learning about Jena code for JENA-632. Even though datasets, streaming queries collections and all that part about journaling and graph persistence can be a bit scary. Probably that won't be covered in the commons-rdf, but I think that's correct.

Thanks!
Bruno


----- Original Message -----

> From: Andy Seaborne <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Saturday, January 17, 2015 7:40 AM
> Subject: Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
>
> On 15/01/15 11:52, Bruno P. Kinoshita wrote:
>
>>  Hello!
>>
>>
>>  I feel like I can't help much in the current discussion. But just
> wanted to chime in
>>  and tell that I'm +1 for a [rdf] component in Apache Commons. As a
> commons committer I'd
>>  like to help.
>>
>>  I started watching the GitHub repository and have subscribed to the ongoing
> discussion. I'll
>>
>>  tryto contribute in some way; maybe testing and with small patches.
>>
>>
>>  My go-to Maven dependency for RDF, Turtle, N3, working with ontologies,
> reasoners, etc,
>>
>>  is Apache Jena. I think it would be very positive to have a common
> interface that I could
>>  use in my code (mainly crawlers and data munging for Hadoop jobs) and that
> would work
>>
>>  with different implementations.
>>
>>
>>  Thanks!
>>
>>  Bruno
>
> Since you mention Jena ... :-)
>
> Jena can (and does) support multiple APIs over a common core.
>
> A commons-rdf API can be added along side the existing APIs; that means
> it is not a "big bang" to have commons-rdf interfaces supported.
>
> There is a lot more to working with RDF than the RDF API part - SPARQL
> engines don't use that API if they want performance and/or scale. (1)
> SPARQL queries collections of graphs and (2) for scale+persistence, you
> need to work in parts at a level somewhat lower level than java objects,
> and closer to the binary of persistence structures.
>
>     Andy
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Stian Soiland-Reyes
In reply to this post by Sergio Fernández
On 15 Jan 2015 11:06, "Sergio Fernández" <[hidden email]> wrote:

> Therefore my proposal for Commons RDF is the following:
>
> * Commons RDF proposes an API that addresses portability issues. I'd
recommend to start form what we currently have at github which was actually
designed by committee and both Jena and Sesame already started to implement.
> * We evolve the current design in the context of Apache Commons Sandbox
> * We keep separated the API from the implementations:
> * We keep clear the point that the major established RDF Toolkits (Apache
Jena and OpenRDF Sesame) are the recommended implementations
> * We make an open call for contributing basic implementations to the
project. We can adopt the one provided by Stian, and also work with Reto to
move the Clerezza-based implementation (aka Apache Commons Sandbox RDF) to
that API (what seems to be what he is willing to do anyway). The feedback
from those implementations would be consider for evolving the API.

+1 to all the above.

>
> We can easily organize in different Maven artifacts if we all agree on
this setup.

+1 - I can have a first go at this if you want, including Reto's module.

> I just want to ask about the option of having a dedicated dev mailing
list, keeping the general style for announcements or things relevant for
the whole project

Just rdf@commons should do? Its both a topic and a component.
Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

James Carman
On Saturday, January 17, 2015, Stian Soiland-Reyes <[hidden email]> wrote:
>
>
> Just rdf@commons should do? Its both a topic and a component.
>

What about floor wax?  Dessert topping?
Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Benedikt Ritter-4
In reply to this post by Peter Ansell
Hello Peter,

2015-01-16 1:29 GMT+01:00 Peter Ansell <[hidden email]>:

> The Clerezza team were all notified about the effort to put a common
> RDF API together on GitHub and they responded positively at that
> point. The only sticking point then and now IMO is the purely academic
> distinction of opening up internal labels for blank nodes versus not
> opening it up at all. Reto is against having the API allow access to
> the identifiers on academic grounds, where other systems pragmatically
> allow it with heavily worded javadoc contracts about their limited
> usefulness, per the RDF specifications:
>
>
> https://mail-archives.apache.org/mod_mbox/clerezza-dev/201406.mbox/%3C5398B07C.5000507@...%3E
>
> However, for some more background we could refer back to discussion
> about restructuring both Clerezza and Stanbol to make them more
> maintainable and useful to the community:
>
>
> https://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCAA7LAO2X++Uk8PoNM+b9=f9v2DN5zdZLjh2BjE0MmrZCYafnZQ@...%3E
>
> In particular, as Rupert Westenthaler mentions there, the goal to
> simply promote the Clerezza RDF model as commons.rdf would not achieve
> much given the share that Jena and Sesame have.
>
> The Commons RDF effort that Sergio has brokered, including Andy (Jena)
> and I (Sesame), and including both Scala (w3c/banana-rdf@github) and
> Clojure (drlivingston/kr@github) project representatives will provide
> the common JVM RDF API that Rupert referred to as being necessary.
>
> The main points as I see it that are necessary before starting the
> process that was aborted last time (echoing Sergio's comments):
>
> * Mailing list clutter: both in terms of the wide range of technical
> discussions from commons rdf, and general email traffic from other
> commons sub-projects discouraging potential participants from joining
> in the discussion.
>

We're discussing this. I need to catch up with that discussion, since I've
bin offline for a few days :-)


> * Being able to use GitHub pull requests for code review, including if
> necessary the sending of comments there to the apache mailing list
> that is decided to be used for that purpose. The actual merging will
> be done by hand in this case, but the code review features there are
> too useful. The patching of PR comments back to apache mailing lists
> has already done, so there is no technical issue for this, just
> deciding which mailing list the comments will go to.
>

There is a infra hook that can forward any comments on github issues to
jira issues. I think this would be sufficient. Github mirrors are read
only, so you will have to live with the manual merge approach...


> * Having it okay that the commons rdf api is a project that
> principally aims to create a set of interfaces, and not host any of
> the scalable implementations of the API. Stian Soiland-Reyes has
> written a basic implementation, but in practice, any large dataset
> will not load into that implementation and be queried efficiently, so
> it is only going to be used for small in-memory tasks.
>

This is something you guys have to figure out :-)


>
> I hope there is no bad blood from the aborted effort last time. There
> were a variety of causes, including the reasons above but we all
> joined the GitHub discussion with the goal of hosting the project
> inside of the Apache Foundation and IMO Apache Commons is still likely
> the best way to do that for our small (in terms of code) project.
>
> Cheers,
>
> Peter
>

To sum this up: All that is blocking github commons rdf to join Apache
Commons is the mailing list thing?

Regards,
Benedikt


>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Stian Soiland-Reyes
On 18 January 2015 at 12:07, Benedikt Ritter <[hidden email]> wrote:
> There is a infra hook that can forward any comments on github issues to
> jira issues. I think this would be sufficient. Github mirrors are read
> only, so you will have to live with the manual merge approach...

Agreed - the infra-hook from github pull requests to mailing list
works fairly well for Apache Jena already to create focused and
tracked threads  - see examples at
https://github.com/apache/jena/pulls

.. if this can be extended with a Jira integration, just the better!


>> * Having it okay that the commons rdf api is a project that
>> principally aims to create a set of interfaces, and not host any of
>> the scalable implementations of the API. Stian Soiland-Reyes has
>> written a basic implementation, but in practice, any large dataset
>> will not load into that implementation and be queried efficiently, so
>> it is only going to be used for small in-memory tasks.
> This is something you guys have to figure out :-)

Yes, religious/academic discussions can have a better home on a new
mailing list and larger sub-community - I hope those on the
dev@commons list who just became interested would not mind joining a
new list - whatever its name might be.


> To sum this up: All that is blocking github commons rdf to join Apache
> Commons is the mailing list thing?

I believe that sums it up!

The IP Clearance should be a small formality.

--
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Andy Seaborne
In reply to this post by Benedikt Ritter-4
On 18/01/15 12:07, Benedikt Ritter wrote:
...
> To sum this up: All that is blocking github commons rdf to join Apache
> Commons is the mailing list thing?
 >
 > Regards,
 > Benedikt

Some thinking out loud ... in email ...

There are two mailing lists issues : "dev" and "commits".

"commits" has some very specific requirements for the Apache Commons PMC
(although this is sandbox, hence unreleasable, so therte is another
checkpoint later).

"commits" is also a place to see updates for contributors as well.

The commons commit list is about 350-900 emails a month; one day is over
several archive (/mbox/) pages.

It might work to seek tags like "apache/commons/rdf" but I feel that
asking people to get sophisticated (specialised) for one particular
community (we are all in many communities) is yet another barrier.  Any
barrier is surmountable but the cumulative effects are not helpful in
establishing community.

Apache Commons works well for the existing components.  I wonder how
much variation to normal Apache Commons processes is sensible and viable
long term.  (c.f. the "mini incubator" effect).  There is to me a
difference between this being a one-off variation of Apache Commons
compared to a evolution of Apache Commons for any component.  I don't
know where the balance is, should be, could be, or can be.

        Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Emmanuel Bourg-3
Le 19/01/2015 15:52, Andy Seaborne a écrit :

> It might work to seek tags like "apache/commons/rdf" but I feel that
> asking people to get sophisticated (specialised) for one particular
> community (we are all in many communities) is yet another barrier.

The commit mails for the Git repositories are prefixed with the name of
the component. So if the Github repository is cloned into Apache
Commons, all you need is to filter the messages sent to
*@commons.apache.org with a title containing "[rdf]", this will work for
the commits and the discussions

Emmanuel Bourg


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Andy Seaborne
In reply to this post by Bruno P. Kinoshita
On 17/01/15 12:00, Bruno P. Kinoshita wrote:
> Hi Andy!
>
>> Jena can (and does) support multiple APIs over a common core.
>>
>> A commons-rdf API can be added along side the existing APIs; that means
>> it is not a "big bang" to have commons-rdf interfaces supported.
>
> That's great! Would the commons-rdf dependency go in jena-core/pom.xml? Is it going to be necessary to change some classes in the core? I think it will be transparent for other modules like ARQ, Fuseki, Text. Is that right?

I don't think so - Jena's core is "generalized" RDF and this is important.

Just adding any new interfaces to the code Node (etc) objects isn't
ideal: you get multiple method names for the same thing.  And the
hashcode/equality contract to work across implementations (hashCode() of
implementation A must be the same as hashCode() of implementation B when
equality is the same ) is really quite tricky.

See also my comments about using classes not interfaces.

I personally do not see the worry about wrappers - for me the importance
is the architectural difference of a presentation API, designed for
applications to write code against, and systems API, designed to support
the machinery.  Java is really rather good at optimizing away the cost
of wrappers, including with multisite method dispatch optimizations and
coping with dynamic loading code that changes assumptions at a later time.

So a new module that is "jena-commons-rdf" that provides an application
presentation API woudl be the obvious route to me.  Fuseki etc

And this is only RDF, not Datasets or SPARQL.  We discussed that and
fairly easily came to the conclusion that getting some common sooner was
better than a complete set of APIs.  Some of the natural other ones are
a lot more complicated - they would build on the terms provided by
commons-rdf.

>> There is a lot more to working with RDF than the RDF API part - SPARQL
>> engines don't use that API if they want performance and/or scale. (1)
>> SPARQL queries collections of graphs and (2) for scale+persistence, you
>> need to work in parts at a level somewhat lower level than java objects,
>> and closer to the binary of persistence structures.
>
> Good point. I'm enjoying learning about Jena code for JENA-632. Even though datasets, streaming queries collections and all that part about journaling and graph persistence can be a bit scary.

:-)

Luckily, journalling and persistent is orthogonal to implementation
JENA-632 though as a application feature mapped over the whole system,
its a good way of seeing across several components.

> Probably that won't be covered in the commons-rdf, but I think that's correct.

I agree - there is a new world out here - a world of large memory
machines, and quite likely, large scale persistent RAM in the not too
distant future.  Given the longevity of shared APIs, it's very hard to
find a balance across requirements and expectations.  The graph level is
naturally driven by the specs but as soon as systems issues get thrown
into the mix, the choice space is much larger.

        Andy

>
> Thanks!
> Bruno
>
>
> ----- Original Message -----
>> From: Andy Seaborne <[hidden email]>
>> To: [hidden email]
>> Cc:
>> Sent: Saturday, January 17, 2015 7:40 AM
>> Subject: Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
>>
>> On 15/01/15 11:52, Bruno P. Kinoshita wrote:
>>
>>>   Hello!
>>>
>>>
>>>   I feel like I can't help much in the current discussion. But just
>> wanted to chime in
>>>   and tell that I'm +1 for a [rdf] component in Apache Commons. As a
>> commons committer I'd
>>>   like to help.
>>>
>>>   I started watching the GitHub repository and have subscribed to the ongoing
>> discussion. I'll
>>>
>>>   tryto contribute in some way; maybe testing and with small patches.
>>>
>>>
>>>   My go-to Maven dependency for RDF, Turtle, N3, working with ontologies,
>> reasoners, etc,
>>>
>>>   is Apache Jena. I think it would be very positive to have a common
>> interface that I could
>>>   use in my code (mainly crawlers and data munging for Hadoop jobs) and that
>> would work
>>>
>>>   with different implementations.
>>>
>>>
>>>   Thanks!
>>>
>>>   Bruno
>>
>> Since you mention Jena ... :-)
>>
>> Jena can (and does) support multiple APIs over a common core.
>>
>> A commons-rdf API can be added along side the existing APIs; that means
>> it is not a "big bang" to have commons-rdf interfaces supported.
>>
>> There is a lot more to working with RDF than the RDF API part - SPARQL
>> engines don't use that API if they want performance and/or scale. (1)
>> SPARQL queries collections of graphs and (2) for scale+persistence, you
>> need to work in parts at a level somewhat lower level than java objects,
>> and closer to the binary of persistence structures.
>>
>>      Andy
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Reto Gmür-2
In reply to this post by Peter Ansell
On Fri, Jan 16, 2015 at 12:29 AM, Peter Ansell <[hidden email]>
wrote:

> The only sticking point then and now IMO is the purely academic
> distinction of opening up internal labels for blank nodes versus not
> opening it up at all. Reto is against having the API allow access to
> the identifiers on academic grounds, where other systems pragmatically
> allow it with heavily worded javadoc contracts about their limited
> usefulness, per the RDF specifications:
>

Hi Peter,

Sorry for the late reply.

I see that the javadoc for the internalIdentifier method has now become
quite long.

It says:

* In particular, the existence of two objects of type {@link BlankNode}  *
with the same value returned from {@link #internalIdentifier()} are not  *
equivalent unless they are known to have been created in the same local  *
scope.
It is however not so clear what such a local scope is. It says that such a
local scope may be for example a JVM instance.  Can the scope also be
narrower? To allow removing redundancies (as described in
https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md) no
promise should be made that a bnode with the same ID in the same JVM will
denote the same node. On the other hand, how long is it guaranteed thath if
I have a BNode objects I can add triples to a graph and this object will
keep representing the same RDF Node? Does it make a difference if I keep
the instance or is I create a new instance with the same internal
identifier?

Similarly: can I add bnodes I get form one graph form one implementation to
another? If I get BNode :foo from G1 can I add the triple (:foo ex:p ex:o)
it to G2? When later or I will add (:foo ex:q ex:r) to G2 will the two
triples have the same subject?

I think these are important questions to allow generic interoperable
implementations. I'm not saying that questions like the one I answer in the
Readme of my draft cannot be satisfactory answered when having such an
internal identifier, but I think it might get more complicated and less
intuitive for the user.

Also, you're writing about "opening up" the labels. This make sense from a
triple store perspective where the BNode have such an internal label.
However I think this should not be the only usecase scenario. One can very
well use the RDF API to expose some arbitrary java (data) objects as RDF.
I've illustrated such a scenario here:

http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-=XkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ@...%3E

I'm not sure if with the github API one could say "the scope is the node
instance" and return a fixed identifier for all BNode. If so the identifier
is obviously pointless. If on the other hand one would have to assign
identifier to all the objects the complexity of the implementation this
would make implementations more complex both in terms of code as in terms
of memory usage.

Again, it seems to make things more complex while I see no clear advantage
comparing with the toString() method the object has anyway.

Cheers,
Reto