Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Stian Soiland-Reyes
I agree that "local scope" should be clarified - I have been just as confused!


By keeping the "internalIdentifier" property, an application is able
to talk about an existing blankNode without having to keep track of
earlier BlankNode instances (e.g. not needing their own
Map<internalIdentifier,BlankNode>).

It also means that a streaming copy from one implementation to another
would work - even if there would be multiple JVM objects "on the line"
representing the same BlankNode - having the same internalIdentifier.


That said, nothing is preventing
RDFTermFactory.createBlankNode(internalIdentifier) from always
returning the same JVM object through some kind of lookup - as long as
that object then is able to live in multiple "local scopes" or
Graph.add()  copies it to set the scope.


There's an open issue about what is the extent of this "local scope"
and how this affects equivalence.

https://github.com/commons-rdf/commons-rdf/issues/56


My attempt to confuse this further:

https://github.com/commons-rdf/commons-rdf/pull/48



Some earlier discussions about equals:

https://github.com/commons-rdf/commons-rdf/issues/45


I think your FAQ in
https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md

is great - but this seems to include some JVM-specific decisions that
might be easy to do in all the RDF implementations.



I think we have agreed that the localIdentifier doesn't have to do
anything with the ntriplesString, which I have reflected in the tests.
Thus a local identifier like "not: a URI or anything" is fine - all we
know is that two BlankNode with the same local identifier in the same
Graph should be equal, and that their ntriplesString - whatever it is
(I do UUID v3 if the id doesn't work) should also be equal.


What is unclear is how this "local scope" propagates - as it's not
exposed anywhere in the current interfaces.

Perhaps blank nodes should only be possible to create from/with a Graph?

When you say a scope could be narrower.. what do you mean, narrower
than a Graph? I guess say from a SPARQL result set using the Commons
RDF API (but not Graph), the scope would be that particular result
set.



Andy has said he would like the ability to copy such a BlankNode to a
different graph, then back again to the first, and then be equal to
the original BlankNode. (Not sure if this was meant with inserting the
same BlankNode object into two Graphs directly, or making a single or
tTriple instance that is added into two Graphs).

It is unclear if that BlankNode object in graph 2 will have the same
"local scope" as the BlankNode in graph 1. Is a Triple added to two
graphs now in two local scopes?



In the 'simple' implementation achieved the back-and-forward
equivalence by keeping a "local scope" as an Optional<Graph> within
the BlankNodeImpl, and use this as part of equivalence:

https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/BlankNodeImpl.java#L40


Should a a "free-standing" BlankNodeImpl (not inside a Triple) claim
to be equal to, or NOT equal to another BlanNodeImpl with same
localIdentifier if neither are in any scope?  Currently I think my
implementation does the first of this.


On Graph.add(Triple) I always make a clone of TripleImpl (to not
overwrite the localScope), which will call "inScope" to clone the
BlankNode with the new graph as scope.

https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/TripleImpl.java#L63


But I see now that with the split Graph.add(s,p,o) form I don't
propagate the Graph localScope correctly and might even cause a NPE:

https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/GraphImpl.java#L43
https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/TripleImpl.java#L46

.. so this is tricky to get right!

<rant>
Whoever invented Blank Nodes... why not just
<urn:uuid:7096a534-d698-414c-87fa-4b09ca5d03f2> and be done with it.
If something exists, it exists.. just give it a name - anything! Names
come cheap - at least now that we got rid of LSID servers :)
</rant>


On 27 January 2015 at 13:39, Reto Gmür <[hidden email]> wrote:

> On Fri, Jan 16, 2015 at 12:29 AM, Peter Ansell <[hidden email]>
> wrote:
>
>> The only sticking point then and now IMO is the purely academic
>> distinction of opening up internal labels for blank nodes versus not
>> opening it up at all. Reto is against having the API allow access to
>> the identifiers on academic grounds, where other systems pragmatically
>> allow it with heavily worded javadoc contracts about their limited
>> usefulness, per the RDF specifications:
>>
>
> Hi Peter,
>
> Sorry for the late reply.
>
> I see that the javadoc for the internalIdentifier method has now become
> quite long.
>
> It says:
>
> * In particular, the existence of two objects of type {@link BlankNode}  *
> with the same value returned from {@link #internalIdentifier()} are not  *
> equivalent unless they are known to have been created in the same local  *
> scope.
> It is however not so clear what such a local scope is. It says that such a
> local scope may be for example a JVM instance.  Can the scope also be
> narrower? To allow removing redundancies (as described in
> https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md) no
> promise should be made that a bnode with the same ID in the same JVM will
> denote the same node. On the other hand, how long is it guaranteed thath if
> I have a BNode objects I can add triples to a graph and this object will
> keep representing the same RDF Node? Does it make a difference if I keep
> the instance or is I create a new instance with the same internal
> identifier?
>
> Similarly: can I add bnodes I get form one graph form one implementation to
> another? If I get BNode :foo from G1 can I add the triple (:foo ex:p ex:o)
> it to G2? When later or I will add (:foo ex:q ex:r) to G2 will the two
> triples have the same subject?
>
> I think these are important questions to allow generic interoperable
> implementations. I'm not saying that questions like the one I answer in the
> Readme of my draft cannot be satisfactory answered when having such an
> internal identifier, but I think it might get more complicated and less
> intuitive for the user.
>
> Also, you're writing about "opening up" the labels. This make sense from a
> triple store perspective where the BNode have such an internal label.
> However I think this should not be the only usecase scenario. One can very
> well use the RDF API to expose some arbitrary java (data) objects as RDF.
> I've illustrated such a scenario here:
>
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-=XkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ@...%3E
>
> I'm not sure if with the github API one could say "the scope is the node
> instance" and return a fixed identifier for all BNode. If so the identifier
> is obviously pointless. If on the other hand one would have to assign
> identifier to all the objects the complexity of the implementation this
> would make implementations more complex both in terms of code as in terms
> of memory usage.
>
> Again, it seems to make things more complex while I see no clear advantage
> comparing with the toString() method the object has anyway.
>
> Cheers,
> Reto

--
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Peter Ansell
Hi Stian and Reto,

Blank nodes are hard to support within a single system. They are
fairly close to unsustainable within a general system. However, within
a system that has RDF-1.1 as its theoretical basis, the W3C spec
defines the mapping functions that are necessary to define equivalence
between graphs (but does not say how translation should work in
practice). Hence the discussion and a long contract to come to
agreement about something that is consistent with the W3C specs, but
extends them where necessary to make them work across the JVM.

Part of this issue is that while it is necessary to expose some
internally unique information about the BlankNode, the concrete syntax
(or the Java Object for intra-VM translation), may not have assigned
any identifier to the BlankNode. N-Triples for instance must
necessarily know about an identifier to serialise a Triple independent
of the context of a Graph.

Hence we are trying to converge on a method for consistently assigning
labels to blank nodes based on the parser (sorry if the JVM wide local
scope comment confused you, the local scope probably needs to be
smaller than that, at either the individual document parse level or
the Graph level).

Some of the use cases that we are trying to support are:

1. The same document parsed using the same parser implementation into
the same graph may generate BlankNode objects that are .equals and if
they are .equals the .hashCode must be the same.

2. The same document parsed using the same parser implementation into
two different graphs must generate BlankNode objects that are not
.equals() and hopefully do not have the same .hashCode().

3. Two different documents parsed using the same parser implementation
into the same graph must generate BlankNode objects that are not
.equals() and have different .hashCode() results. This includes cases
where the concrete syntax contained the same label for the blank node.

4. The same document parsed using different parser implementations
into two different graphs must generate BlankNode objects that are not
.equals() and hopefully do not have the same .hashCode().

5. Two different documents parsed using different parser
implementations may be then transferred into the same graph and the
BlankNode objects inside of the graph must not be .equals() if they
came from different physical documents, even if the concrete syntax
contained the same label for the blank node.

Andy has also brought up the possibility of round-tripping in addition
to those requirements. Ie, a BlankNode from one graph could be
inserted into another graph, and after some time it should be possible
to put it back into the first graph and have it operate as if it were
not moved out. The current proposal doesn't allow for that and I am
not sure what would be required for that to work.

In addition, it is hoped that all of the objects in the system could
be immutable within a graph.

We have not discussed trimming graphs previously. I have never come at
RDF with the requirement of being able to remove triples but I may
have had a limited set of use cases. Is there a usecase for that
automatic trimming that could not be easily satisfied using a rules
engine, as any automatic removal of triples is outside of what I
envisioned the scope of Commons RDF to be and it hasn't been brought
up by any others. Even if in RDF theory there is some corner case
where it is allowed for, it is not a general requirement and is not
generally used or asked for in my experience.

I am fairly ambivalent on the case for internalIdentifier being
substitutable for .toString, but currently we need to work out a
consistent way to identify the local scope, and it could be used in
conjunction with either internalIdentifier or toString if both have
the same contract in practice. What we are doing endeavouring to
transfer BlankNodes between implementations inside of the JVM and keep
their general identity (and round-tripping adds another level of
difficulty on top of that). If we just rely on .toString then we may
need to embed the local scope information into the resulting string,
so the two pieces of information would be compressed into one, which
may not be ideal in the end. In a broader sense, it would be great if
the new Commons RDF API didn't enforce restrictions on .toString that
already has consistent meanings in each of the implementations, and
unique new methods give more flexibility there.

Thanks,

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Andy Seaborne
In reply to this post by Stian Soiland-Reyes
On 27/01/15 17:11, Stian Soiland-Reyes wrote:
> I agree that "local scope" should be clarified

"local scope" is a piece of terminology used only for RDF syntax.  Once
away from syntax, there is no "scope" to a blank node.

It is described in:
http://www.w3.org/TR/rdf11-concepts/#section-blank-nodes

The scope is the file for the purposes of reading that file once.

A bnode "_:a" is the same bNode everywhere in that file at the time it
is read (i.e. parsed).

If the file is read twice, "_:a" generates a different blank nodes.

The only things you can do with blank nodes are:

* Create a new one ("a fresh one"), different from every other blank node.

* See if it is the same as another (java's .equals) because all RDF
terms are distinguishable [1].

* Put them in triples and hence into graphs.

That has the implications that they can be put into several datastructures.

The description in the javadoc:
"""
They are always locally scoped to the file or RDF store
"""
is not right.  They are not scoped to the RDF store.

The nearest concept is that one store must have created it in the first
place but once created, the blank node is just a thing and a rather
simple, boring thing at that.

This analogy might help (or not):

There is a table with 4 metal spheres on it in a line across it.  Each
sphere is exactly kind of material, the same mass, the same colour and
the same shininess.  You can ask "is that sphere the same as that other
one?" by pointing to two of them.  If you put them in a bag, shake the
bag and take one out, you can't tell whether this chosen one is the same
as the one that was on the right-hand end of the line.

        Andy

[1] http://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Peter Ansell
On 28 January 2015 at 20:53, Andy Seaborne <[hidden email]> wrote:

> On 27/01/15 17:11, Stian Soiland-Reyes wrote:
>>
>> I agree that "local scope" should be clarified
>
>
> "local scope" is a piece of terminology used only for RDF syntax.  Once away
> from syntax, there is no "scope" to a blank node.
>
> It is described in:
> http://www.w3.org/TR/rdf11-concepts/#section-blank-nodes
>
> The scope is the file for the purposes of reading that file once.
>
> A bnode "_:a" is the same bNode everywhere in that file at the time it is
> read (i.e. parsed).
>
> If the file is read twice, "_:a" generates a different blank nodes.
>
> The only things you can do with blank nodes are:
>
> * Create a new one ("a fresh one"), different from every other blank node.

That is a restriction that many systems do not have right now. Many
systems have been designed without this restriction for non-anonymous
blank nodes (ie, ones that are labelled and can appear multiple time
at any point in the document) so that the system could parse very
large collections from concrete RDF syntaxes using a finite (possibly
a fixed, small) amount of memory. Although in theory BlankNodes are
being "picked from an infinite set" finite computers mean there have
to be some compromises from the mathematical model underpinning RDF.
In practice, some systems allow you to ask for a blank node object to
be created using entropy (string/bytes/etc.) that will create an
equivalent object at another point in time to the other instances of
the blank node, based on the concrete RDF syntax supporting
non-anonymous blank nodes that are not localised to a particular part
of the document as with anonymous blank nodes.

Both anonymous and non-anonymous blank nodes are going to have the
same features after parsing, the issue is just during parsing what to
do to make streaming possible.

> * See if it is the same as another (java's .equals) because all RDF terms
> are distinguishable [1].

One of the goals of Commons RDF is to agree effectively on how to to
provide interoperability between implementations, and a key part of
that is defining Java Object equality.

We haven't defined which levels the expectation of equivalence would
be applicable at yet but we could think about it at any of the
following levels:

* RDFTerm level : being able to send any RDFTerm into any API even if
the underlying implementation is different, and have it recognised as
equivalent to any other RDFTerm which from the users perspective was
.equals() when they sent it in.

* Triple level : being able to send any Triple into any API, even if
the underlying implementation is different, and have it recognised as
equivalent to any other Triple which from the users perspective was
.equals().

* Graph level : being able to send a Graph into an API, even if the
underlying implementation is different, and have operations inside of
the API consistent with equivalence that the user saw in the Graph.
Note, this doesn't require the Graph to be mutable, but it may require
taking a copy of the objects and discarding the reference to the Graph
which may then be garbage collected if the user doesn't keep a
reference to the Graph.

* Dataset level (ie, Named and Default Graphs in a Set) : being able
to send a Dataset into an API, even if the underlying implementation
is different, and have operations inside of the API consistent with
the users view. Similarly, the Dataset doesn't need to be mutable, a
copy may be taken by the implementation to do its operations.

I would personally expect all of the RDFTerm, Triple, and Graph levels
to be supported. We haven't gone as far as to create a Dataset API yet
so that is out of scope still. The RDFTerm level is of course the most
difficult to support and it may go out of scope for the BlankNode set,
although it is trivial to support for IRI and Literal so they may
still practically be directly interoperable.

> * Put them in triples and hence into graphs.
>
> That has the implications that they can be put into several datastructures.
>
> The description in the javadoc:
> """
> They are always locally scoped to the file or RDF store
> """
> is not right.  They are not scoped to the RDF store.
>
> The nearest concept is that one store must have created it in the first
> place but once created, the blank node is just a thing and a rather simple,
> boring thing at that.

If the same blank node reference is encountered within the same
document, either the parser or the store has to map them to objects
(could be the same object) that has both .equals() as true and has the
same .hashCode(). If the responsibility is on the parser then, for
non-anonymous blank nodes that are encountered in concrete syntaxes,
the objects that the parser creates need to be consistent across the
entire document. If the store is only able to create blank nodes as
boring unique individualised objects, then at some point the parser
itself will be forced to keep track of which Java Objects were created
for which non-anonymous blank nodes.

> This analogy might help (or not):
>
> There is a table with 4 metal spheres on it in a line across it.  Each
> sphere is exactly kind of material, the same mass, the same colour and the
> same shininess.  You can ask "is that sphere the same as that other one?" by
> pointing to two of them.  If you put them in a bag, shake the bag and take
> one out, you can't tell whether this chosen one is the same as the one that
> was on the right-hand end of the line.

I disagree. In particular, if you studied the ball you put in closely
enough you may, in non-trivial situations, find something that was
unique about it in the context of the table/bag, even if it was only
in reference to other lines (ie, other Triples/Quads that were not all
made up of opaque blank nodes balls). If the bag was created from a
single document, then there either needs to be a way to identify which
balls, (at least when attached using predicates to other balls/lines,
ie, IRIs/Literals/Triples/Quads) are equivalent, or alternatively, if
you don't need to support fixed memory streaming of arbitrary length
RDF concrete documents, there needs to be a hard restriction on the
balls being unique Java Objects and the base Object.equals() would be
the equivalency.

>         Andy
>
> [1] http://www.w3.org/TR/rdf11-concepts/#section-rdf-graph
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reto Gmür-2
In reply to this post by Stian Soiland-Reyes
Hi Stian,


> By keeping the "internalIdentifier" property, an application is able
> to talk about an existing blankNode without having to keep track of
> earlier BlankNode instances (e.g. not needing their own
> Map<internalIdentifier,BlankNode>).
>
By application I assume you mean an implementation of the API. Even without
exposing an identifier an application can keep track of their BlankNode,
after all they may all be instances of MyImplBNode which contains all the
field required by the backend. The question is what should happen with
BlankNode that comes from other implementations. The clerezza approach is:
they work just as well, and as long as the blanknode object is reachable
(i.e. not eligible for garbage collection) it is guaranteed that the
implementation returns an equal instance to represent this node.

The newest github javadoc says "two BlankNode in different Graphs MUST
differ". So how is this implemented, if I have a Bnode n from G1 and I add
the triple (n,p,o) to G2 with which bnode will n now differ? It can hardly
be G1, one could argue that G2 as it doesn't recognize n as one of its own
it has to create n', so that the triple actually stored is (n',p,o). If I
query for (?,p,o) the result would not contain n but n'. But what if I go
on adding triples with n? Quite clearly the intention of the following
pseudo code is to add two triples with the same subject:

G2.add(n,p,o);
G2.add(n,q,r);

But if after the first invocation the implementation must make sure the
added bnode is not equals to the "alien" bnode. so the second call to add
will create a new triple with a different subject. In practice this would
mean that there is no interoperability in the general case and that one
needs to add only terms created with the right term factory.

So it seems to me that we have no real benefit from exposing the internal
id but that it causes quite some limitations and complexity.


> It also means that a streaming copy from one implementation to another
> would work - even if there would be multiple JVM objects "on the line"
> representing the same BlankNode - having the same internalIdentifier.
>

This seems hard to reconcile with the postulate that BNodes from different
graphs MUST differ. On adding the second statement with a Bnode with the
same ID (be it the same instance or not) we are not adding a triple with
the same BNode that's already in the graph

>
>
> That said, nothing is preventing
> RDFTermFactory.createBlankNode(internalIdentifier) from always
> returning the same JVM object through some kind of lookup - as long as
> that object then is able to live in multiple "local scopes" or
> Graph.add()  copies it to set the scope.
>
Even if it always returns different instances thy still have to handle the
complexity of being added to different scopes and then become different.
Seems incredibly complex compared with alternative (clerezza) approach
where BlankNode is just objects without any exposed internas. What usecases
justify this added complexity?



>
>
> There's an open issue about what is the extent of this "local scope"
> and how this affects equivalence.
>
> https://github.com/commons-rdf/commons-rdf/issues/56
>
>
> My attempt to confuse this further:
>
> https://github.com/commons-rdf/commons-rdf/pull/48
>
>
>
> Some earlier discussions about equals:
>
> https://github.com/commons-rdf/commons-rdf/issues/45
>
>
> I think your FAQ in
> https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md
>
> is great - but this seems to include some JVM-specific decisions that
> might be easy to do in all the RDF implementations.
>

I assume you mean "NOT easy to do".

This is about providing the best possible API for Java and maybe for other
languages on the JVM. So I think the contract guaranteed by the API can
well be expressed using the whole power of this platform. At the end of day
most implementations will store the data somewhere outside the JVM, that's
fine. The Clerezza API (following the principles in the Readme) is
implemented against multiple backends. For backends allowing identification
of BNode it is very unproblematic and at most needs usage of a WeakHashMap
(creating a small linear memory overhead on "alien" BNodes).

It is harder to implement the API on top of a backend that does not expose
ids of BNodes (notably a sparql endpoint) as the BNode implementation have
to keep track of containing subgraphs. Clearly in such a situation with the
Github proposal one would have to arbitrarily create identifier (maybe
based on a hash of the Minimum Self Contained Graph of the Blanknode + a
canonical bnode labeling within this Graph)


>
> I think we have agreed that the localIdentifier doesn't have to do
> anything with the ntriplesString, which I have reflected in the tests.
> Thus a local identifier like "not: a URI or anything" is fine - all we
> know is that two BlankNode with the same local identifier in the same
> Graph should be equal, and that their ntriplesString - whatever it is
> (I do UUID v3 if the id doesn't work) should also be equal.
>
>
> What is unclear is how this "local scope" propagates - as it's not
> exposed anywhere in the current interfaces.
>
> Perhaps blank nodes should only be possible to create from/with a Graph?
>
> When you say a scope could be narrower.. what do you mean, narrower
> than a Graph? I guess say from a SPARQL result set using the Commons
> RDF API (but not Graph), the scope would be that particular result
> set.
>

The API Documentation I looked at a couple of days ago when I wrote the
previous mail said the scope might be the JVM. Now the API is clear that it
has to be narrower and BNodes with the same Id must be different in
different graphs.

As you write a consequence of this could be that it's only possible to
create thrm from/with a Graph. Again,, what are the advantages compared
with the approach described in the SVN-Readme where BNodes have no exposed
identifier and a BNode object may as long as it is alive be in multiple
graphs?


>
>
>
> Andy has said he would like the ability to copy such a BlankNode to a
> different graph, then back again to the first, and then be equal to
> the original BlankNode. (Not sure if this was meant with inserting the
> same BlankNode object into two Graphs directly, or making a single or
> tTriple instance that is added into two Graphs).
>

So we have G1 at time t1:

[] rdf:type foaf:Person.

At t2, we copy G1 to the previously empty G2.

We modify G2 by adding a triple with the existing BNode as subject, so at
t3 G2 looks like this:

[ rdf:type foaf:Person; foaf:name "Alice"].

We modify G1 by adding a triple with the existing BNode as subject, so at
t4 G1 looks like this:

[ rdf:type foaf:Person; foaf:name "Bob"].

I think copying G2 back to G1 (at t5) should result in there be two persons
in G1, not one person called both "Alice" and "Bob". With clerezza the
BNodes in the different graphs would generally not be equals and thus
result in two persons in G1 at t5. An exception is if we actually kept the
BNode instance referenceable in memory, this could be called the "BNode
zeno effect" ;)



>
> It is unclear if that BlankNode object in graph 2 will have the same
> "local scope" as the BlankNode in graph 1. Is a Triple added to two
> graphs now in two local scopes?
>
>
>
> In the 'simple' implementation achieved the back-and-forward
> equivalence by keeping a "local scope" as an Optional<Graph> within
> the BlankNodeImpl, and use this as part of equivalence:
>
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/BlankNodeImpl.java#L40
>
>
> Should a a "free-standing" BlankNodeImpl (not inside a Triple) claim
> to be equal to, or NOT equal to another BlanNodeImpl with same
> localIdentifier if neither are in any scope?  Currently I think my
> implementation does the first of this.
>
>
> On Graph.add(Triple) I always make a clone of TripleImpl (to not
> overwrite the localScope), which will call "inScope" to clone the
> BlankNode with the new graph as scope.
>
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/TripleImpl.java#L63
>
>
> But I see now that with the split Graph.add(s,p,o) form I don't
> propagate the Graph localScope correctly and might even cause a NPE:
>
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/GraphImpl.java#L43
>
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/TripleImpl.java#L46
>
> .. so this is tricky to get right!
>
> <rant>
> Whoever invented Blank Nodes... why not just
> <urn:uuid:7096a534-d698-414c-87fa-4b09ca5d03f2> and be done with it.
> If something exists, it exists.. just give it a name - anything! Names
> come cheap - at least now that we got rid of LSID servers :)
>
Everybody is free to just use named nodes! The price of it is that we might
end up which tons of owl:sameAs resources. See also:
http://lists.w3.org/Archives/Public/semantic-web/2008Jan/0118.html


> </rant>
>


Cheers,
Reto




>
>
> On 27 January 2015 at 13:39, Reto Gmür <[hidden email]> wrote:
> > On Fri, Jan 16, 2015 at 12:29 AM, Peter Ansell <[hidden email]>
> > wrote:
> >
> >> The only sticking point then and now IMO is the purely academic
> >> distinction of opening up internal labels for blank nodes versus not
> >> opening it up at all. Reto is against having the API allow access to
> >> the identifiers on academic grounds, where other systems pragmatically
> >> allow it with heavily worded javadoc contracts about their limited
> >> usefulness, per the RDF specifications:
> >>
> >
> > Hi Peter,
> >
> > Sorry for the late reply.
> >
> > I see that the javadoc for the internalIdentifier method has now become
> > quite long.
> >
> > It says:
> >
> > * In particular, the existence of two objects of type {@link BlankNode}
> *
> > with the same value returned from {@link #internalIdentifier()} are not
> *
> > equivalent unless they are known to have been created in the same local
> *
> > scope.
> > It is however not so clear what such a local scope is. It says that such
> a
> > local scope may be for example a JVM instance.  Can the scope also be
> > narrower? To allow removing redundancies (as described in
> > https://svn.apache.org/repos/asf/commons/sandbox/rdf/trunk/README.md) no
> > promise should be made that a bnode with the same ID in the same JVM will
> > denote the same node. On the other hand, how long is it guaranteed thath
> if
> > I have a BNode objects I can add triples to a graph and this object will
> > keep representing the same RDF Node? Does it make a difference if I keep
> > the instance or is I create a new instance with the same internal
> > identifier?
> >
> > Similarly: can I add bnodes I get form one graph form one implementation
> to
> > another? If I get BNode :foo from G1 can I add the triple (:foo ex:p
> ex:o)
> > it to G2? When later or I will add (:foo ex:q ex:r) to G2 will the two
> > triples have the same subject?
> >
> > I think these are important questions to allow generic interoperable
> > implementations. I'm not saying that questions like the one I answer in
> the
> > Readme of my draft cannot be satisfactory answered when having such an
> > internal identifier, but I think it might get more complicated and less
> > intuitive for the user.
> >
> > Also, you're writing about "opening up" the labels. This make sense from
> a
> > triple store perspective where the BNode have such an internal label.
> > However I think this should not be the only usecase scenario. One can
> very
> > well use the RDF API to expose some arbitrary java (data) objects as RDF.
> > I've illustrated such a scenario here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-=XkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ@...%3E
> >
> > I'm not sure if with the github API one could say "the scope is the node
> > instance" and return a fixed identifier for all BNode. If so the
> identifier
> > is obviously pointless. If on the other hand one would have to assign
> > identifier to all the objects the complexity of the implementation this
> > would make implementations more complex both in terms of code as in terms
> > of memory usage.
> >
> > Again, it seems to make things more complex while I see no clear
> advantage
> > comparing with the toString() method the object has anyway.
> >
> > Cheers,
> > Reto
>
> --
> Stian Soiland-Reyes
> Apache Taverna (incubating)
> http://orcid.org/0000-0001-9842-9718
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reto Gmür-2
In reply to this post by Peter Ansell
Hi Peter,

Of your usecases the only one which might be an argument for exposing an
blank-node id is:

1. The same document parsed using the same parser implementation into
> the same graph may generate BlankNode objects that are .equals and if
> they are .equals the .hashCode must be the same.
>

For this usecase I assume the parser would recreate bnodes that are tied to
the target graph using the same internal ids on the second round. The graph
then recognize the BNodes as its owns and will not create new nodes.
What I question:

- Is the exposed identifier really needed for this? The parser seems to
know about the target graph, it could apply other means not to recreate
nodes.
- Does the usecase make sense? If the Bnodes added in the first parsing
round are now used in triples quite different than the original, should it
really be identified just because of the common history?
- Also: what is "the same document". Does this mean byte-wise identical or
just having the same location?
- The usecase certainly comes from a legitimate requirement if its about
avoiding duplication in the target graph, in many implementations if I
parse "[ rdf:type foaf:Person; foaf:name "Alice"]." twice into the same
graph I will end up having 4 triples in the graph. of course this graph is
a non-lean graph that could be reduced to two triples. The SPARLQ protocols
is carefully designed to allow implementations to avoid (and remove)
redundancy. I think also in a Java API there should be more generic
mechanism to be able to avoid and allow the backend to remove redundancy
rather than just addressing the situation when the same document is parsed
twice into the same graph.

Cheers,
Reto


On Wed, Jan 28, 2015 at 5:31 AM, Peter Ansell <[hidden email]>
wrote:

> Hi Stian and Reto,
>
> Blank nodes are hard to support within a single system. They are
> fairly close to unsustainable within a general system. However, within
> a system that has RDF-1.1 as its theoretical basis, the W3C spec
> defines the mapping functions that are necessary to define equivalence
> between graphs (but does not say how translation should work in
> practice). Hence the discussion and a long contract to come to
> agreement about something that is consistent with the W3C specs, but
> extends them where necessary to make them work across the JVM.
>
> Part of this issue is that while it is necessary to expose some
> internally unique information about the BlankNode, the concrete syntax
> (or the Java Object for intra-VM translation), may not have assigned
> any identifier to the BlankNode. N-Triples for instance must
> necessarily know about an identifier to serialise a Triple independent
> of the context of a Graph.
>
> Hence we are trying to converge on a method for consistently assigning
> labels to blank nodes based on the parser (sorry if the JVM wide local
> scope comment confused you, the local scope probably needs to be
> smaller than that, at either the individual document parse level or
> the Graph level).
>
> Some of the use cases that we are trying to support are:
>
> 1. The same document parsed using the same parser implementation into
> the same graph may generate BlankNode objects that are .equals and if
> they are .equals the .hashCode must be the same.
>
> 2. The same document parsed using the same parser implementation into
> two different graphs must generate BlankNode objects that are not
> .equals() and hopefully do not have the same .hashCode().
>
> 3. Two different documents parsed using the same parser implementation
> into the same graph must generate BlankNode objects that are not
> .equals() and have different .hashCode() results. This includes cases
> where the concrete syntax contained the same label for the blank node.
>
> 4. The same document parsed using different parser implementations
> into two different graphs must generate BlankNode objects that are not
> .equals() and hopefully do not have the same .hashCode().
>
> 5. Two different documents parsed using different parser
> implementations may be then transferred into the same graph and the
> BlankNode objects inside of the graph must not be .equals() if they
> came from different physical documents, even if the concrete syntax
> contained the same label for the blank node.
>
> Andy has also brought up the possibility of round-tripping in addition
> to those requirements. Ie, a BlankNode from one graph could be
> inserted into another graph, and after some time it should be possible
> to put it back into the first graph and have it operate as if it were
> not moved out. The current proposal doesn't allow for that and I am
> not sure what would be required for that to work.
>
> In addition, it is hoped that all of the objects in the system could
> be immutable within a graph.
>
> We have not discussed trimming graphs previously. I have never come at
> RDF with the requirement of being able to remove triples but I may
> have had a limited set of use cases. Is there a usecase for that
> automatic trimming that could not be easily satisfied using a rules
> engine, as any automatic removal of triples is outside of what I
> envisioned the scope of Commons RDF to be and it hasn't been brought
> up by any others. Even if in RDF theory there is some corner case
> where it is allowed for, it is not a general requirement and is not
> generally used or asked for in my experience.
>
> I am fairly ambivalent on the case for internalIdentifier being
> substitutable for .toString, but currently we need to work out a
> consistent way to identify the local scope, and it could be used in
> conjunction with either internalIdentifier or toString if both have
> the same contract in practice. What we are doing endeavouring to
> transfer BlankNodes between implementations inside of the JVM and keep
> their general identity (and round-tripping adds another level of
> difficulty on top of that). If we just rely on .toString then we may
> need to embed the local scope information into the resulting string,
> so the two pieces of information would be compressed into one, which
> may not be ideal in the end. In a broader sense, it would be great if
> the new Commons RDF API didn't enforce restrictions on .toString that
> already has consistent meanings in each of the implementations, and
> unique new methods give more flexibility there.
>
> Thanks,
>
> Peter
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reto Gmür
In reply to this post by Andy Seaborne
Hi Andy,


This analogy might help (or not):
>
> There is a table with 4 metal spheres on it in a line across it.  Each
> sphere is exactly kind of material, the same mass, the same colour and the
> same shininess.  You can ask "is that sphere the same as that other one?"
> by pointing to two of them.  If you put them in a bag, shake the bag and
> take one out, you can't tell whether this chosen one is the same as the one
> that was on the right-hand end of the line.
>

How many Spheres will you be able to take out of the bag? I once made the
mistake (resulting in a bug) to assume identity of the indiscernible in
RDF, this is however not the case.

To have the sphere exist they need to be part of graph.


If this is the graph:

_:a p _:b.
_:b p _:c.
_:c p _:d.
_:d p _:e.
_:e p _:a.

We can put the 5 sphere to the bag and shake as much as we want we will
always have 5 spheres in the bag. Of course, as you say, if we take one out
we can say which one it is. (But we don't care, we are happy having a shiny
meta sphere in a circle  with 4 other spheres).

By contrast if this is the graph:

_:a rdf:type ex:Sphere.
_:b rdf:type ex:Sphere.
_:c rdf:type ex:Sphere.
_:d rdf:type ex:Sphere.
_:e rdf:type ex:Sphere.

When we open the bag we might just have one sphere. Which is fine, as the
above graph evaluates to true in any world where there is at least one
sphere.


So far for what RDF is concerned. Things are a bit different for the API
that allows creating the graph, in this situation we might actually be
pointing (or looking) at spheres and as long as we do so they should not
disappear. After having added the above 5 triples we might go on adding:


_:a rdf:type ex:Shiny.
_:b rdf:type ex:Heavy.
_:c rdf:type ex:Radiactive.
_:d rdf:type ex:Transparent.
_:e rdf:type ex:Whole.

In this case we should have 5 spheres described by 10 triples. Every well
behaving quality bag will give as back all 5 spheres[*].

In other words as long as I am looking at the spheres I might go on adding
things to theirs descriptions making them actually distinct spheres.

This distinction between being looking at (or pointing at) spheres and just
having them in the bag is very straight forwardly (and imho elegantly)
modeled with the distinction of an object instance being reachable or not.

In the clerezza code and in the SVN commons proposal code along the
following lines will works as expected.

{a,b,c,d,e} is a set of 5 BlankNodes (i.e. we have 5 objects, no two of
them are equals).

g.add(a, RDF.type, EX.Sphere);
g.add(b, RDF.type, EX.Sphere);
g.add(c, RDF.type, EX.Sphere);
g.add(d, RDF.type, EX.Sphere);
g.add(e, RDF.type, EX.Sphere);
//if we save the graph here the backend might just store one triple
//but as long as wen can do the following
g.add(a, RDF.type, EX.Shiny);
g.add(b, RDF.type, EX.Heavy);
g.add(c, RDF.type, EX.Radiactive);
g.add(d, RDF.type, EX.Transparent);
g.add(e, RDF.type, EX.Whole);
//we will end up storing 10 triples

In the API proposal it is neither clear in which situations the backend
might remove redundant information (as new blanknodes object with the same
id might be created with the same factory) nor if the latter 5 add
invocations will indeed add triples with the same subject (as the
implementation might have created bnodes that differ from the original 5 as
these might originate from another graph, see my answer to Stian).

So basically I agree with everything you write in your email and don't see
any reason to expose the internal identifier in the API and have complex
identity criteria.

Cheers,
Reto

 [*] Well, yeah the problem of the metaphors: we will always get back 5
bnodes, even though the graph also evaluates to true in a universe with
just one shiny, heavy, radiactive, transparent and whole sphere.
Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Peter Ansell
On 1 February 2015 at 07:45, Reto Gmür <[hidden email]> wrote:

> In the clerezza code and in the SVN commons proposal code along the
> following lines will works as expected.
>
> {a,b,c,d,e} is a set of 5 BlankNodes (i.e. we have 5 objects, no two of
> them are equals).
>
> g.add(a, RDF.type, EX.Sphere);
> g.add(b, RDF.type, EX.Sphere);
> g.add(c, RDF.type, EX.Sphere);
> g.add(d, RDF.type, EX.Sphere);
> g.add(e, RDF.type, EX.Sphere);
> //if we save the graph here the backend might just store one triple

Hi Reto,

Sorry I don't have time to reply fully, but I repeat what I said
earlier that it is most unusual for any RDF database to just store one
triple in that case. In the context of Java, if you create 5 BlankNode
objects and add them to the database as part of Triples, then the
database should store 5 BlankNodes references and internally make sure
that they are distinct.

Do you have examples of other RDF database systems that operate
according to the Clerezza principle?

Cheers,

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Andy Seaborne
In reply to this post by Reto Gmür
Hi Reto,

There is a key point in this disussion that is worth pulling out.

RDF has a data model and there is also an interpretation of the data model.

The data model is one spec ("Concepts and Abstract Syntax") and the
interpretation in another ("Semantics", also commonly referred to as the
Model Theory).  There others semantics as well (RDFS, OWL-x etc).

The Model Theory only reflects unchanging graphs.  "Lean graphs" are in
the model theory.

RDF data has to be built in the first place using the Data Model.

This layering should be reflected in code.

The commons rdf API should reflect the Data Model; how to build graphs
how they can be manipulated.  Systems can then be built on top of that -
including Clerezza leaning graphs, or owl:sameAs prototype chains (which
interact with leaning) or RDFS inference or ...

Does that work for you?

        Andy

On 31/01/15 20:45, Reto Gmür wrote:

> Hi Andy,
>
>
> This analogy might help (or not):
>>
>> There is a table with 4 metal spheres on it in a line across it.  Each
>> sphere is exactly kind of material, the same mass, the same colour and the
>> same shininess.  You can ask "is that sphere the same as that other one?"
>> by pointing to two of them.  If you put them in a bag, shake the bag and
>> take one out, you can't tell whether this chosen one is the same as the one
>> that was on the right-hand end of the line.
>>
>
> How many Spheres will you be able to take out of the bag? I once made the
> mistake (resulting in a bug) to assume identity of the indiscernible in
> RDF, this is however not the case.
>
> To have the sphere exist they need to be part of graph.
>
>
> If this is the graph:
>
> _:a p _:b.
> _:b p _:c.
> _:c p _:d.
> _:d p _:e.
> _:e p _:a.
>
> We can put the 5 sphere to the bag and shake as much as we want we will
> always have 5 spheres in the bag. Of course, as you say, if we take one out
> we can say which one it is. (But we don't care, we are happy having a shiny
> meta sphere in a circle  with 4 other spheres).
>
> By contrast if this is the graph:
>
> _:a rdf:type ex:Sphere.
> _:b rdf:type ex:Sphere.
> _:c rdf:type ex:Sphere.
> _:d rdf:type ex:Sphere.
> _:e rdf:type ex:Sphere.
>
> When we open the bag we might just have one sphere. Which is fine, as the
> above graph evaluates to true in any world where there is at least one
> sphere.
>
>
> So far for what RDF is concerned. Things are a bit different for the API
> that allows creating the graph, in this situation we might actually be
> pointing (or looking) at spheres and as long as we do so they should not
> disappear. After having added the above 5 triples we might go on adding:
>
>
> _:a rdf:type ex:Shiny.
> _:b rdf:type ex:Heavy.
> _:c rdf:type ex:Radiactive.
> _:d rdf:type ex:Transparent.
> _:e rdf:type ex:Whole.
>
> In this case we should have 5 spheres described by 10 triples. Every well
> behaving quality bag will give as back all 5 spheres[*].
>
> In other words as long as I am looking at the spheres I might go on adding
> things to theirs descriptions making them actually distinct spheres.
>
> This distinction between being looking at (or pointing at) spheres and just
> having them in the bag is very straight forwardly (and imho elegantly)
> modeled with the distinction of an object instance being reachable or not.
>
> In the clerezza code and in the SVN commons proposal code along the
> following lines will works as expected.
>
> {a,b,c,d,e} is a set of 5 BlankNodes (i.e. we have 5 objects, no two of
> them are equals).
>
> g.add(a, RDF.type, EX.Sphere);
> g.add(b, RDF.type, EX.Sphere);
> g.add(c, RDF.type, EX.Sphere);
> g.add(d, RDF.type, EX.Sphere);
> g.add(e, RDF.type, EX.Sphere);
> //if we save the graph here the backend might just store one triple
> //but as long as wen can do the following
> g.add(a, RDF.type, EX.Shiny);
> g.add(b, RDF.type, EX.Heavy);
> g.add(c, RDF.type, EX.Radiactive);
> g.add(d, RDF.type, EX.Transparent);
> g.add(e, RDF.type, EX.Whole);
> //we will end up storing 10 triples
>
> In the API proposal it is neither clear in which situations the backend
> might remove redundant information (as new blanknodes object with the same
> id might be created with the same factory) nor if the latter 5 add
> invocations will indeed add triples with the same subject (as the
> implementation might have created bnodes that differ from the original 5 as
> these might originate from another graph, see my answer to Stian).
>
> So basically I agree with everything you write in your email and don't see
> any reason to expose the internal identifier in the API and have complex
> identity criteria.
>
> Cheers,
> Reto
>
>   [*] Well, yeah the problem of the metaphors: we will always get back 5
> bnodes, even though the graph also evaluates to true in a universe with
> just one shiny, heavy, radiactive, transparent and whole sphere.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reto Gmür-2
Hi Peter, Hi Andy,

I think the Commons RDF API should model the Abstract Syntax, and to quote
the spec "Blank node identifiers are *not* part of the RDF abstract
syntax". Of course if there are very important pragmatic reason to have
some identifiers in the API we can consider having them nevertheless.

So far I haven't seen any use case which would actually require exposed
identifier, apart from the questionable double parsing of the same document
use case. It causes however several questions and difficulties (a node no
longer being identical to itself after being added to a graph).

It is clear that while constructing graphs an implementation must not throw
away redundancy. The question is if the API should force implementation to
keep redundant information, even if triplestores typically keep such
information I think the API shouldn't force them to do so (at least not
without very compelling use cases).

Graphs are immutable both in the Abstract Syntax as well as in the
Semantics. Nevertheless in many situations we want to have mutable graphs.
This is why in clerezza we have MGraphs (this is Graph in the SVN common
proposal) and Graphs (this is ImmutableGraph in SVN). The distinction is
relevant for the definition of equals (and consequently of course of
hashCode): a mutable graph is equals to itself (or to an instance backed by
the same mutable backend graph) while an immutable graph is equals to
another if and only if they are isomorphic.

Should BNode be shareable across Graphs? The Abstract Syntax says that they
can be shared across the graphs of the same dataset, RDF Semantics also
mentions that BNodes can be shared across Graphs when they have the same
origin.

So

g1 = g.subGraph(c);
g2 = g.subGraph(!c);

g, g1 and g2 may share BNodes, so the following is true:

g.equals(g1.union(g2));


What we don't need:
- A mean for application to re-create identical BNodes (implementations may
however do so), if we have a pointer to the BNode that's fine, otherwise we
get existing BNode by accessing the triples in the graph.
- A (necessarily complex) mean to enforce BNode to be different when in
different context. What the context is might go beyond to what is visible
at the API level, an implementation may return two equal nodes in different
graphs because they are from the same dataset. On the other hand an
application might construct a graph by first creating various subgraphs
sharing a bnode and then creating the union of them. The API might provide
a g.skolemize():g method which returns a copy of the graph where no BNode
is identical to any BNode outside the graph.


So to summarize: An implementation may add internal identifier to the BNode
and decide when two objects are identical, but the application should not
see these identifiers and not be able to recreate identical BNodes, they
should instead just use the existing Node.

Cheers,
Reto


On Mon, Feb 2, 2015 at 10:01 PM, Andy Seaborne <[hidden email]> wrote:

> Hi Reto,
>
> There is a key point in this disussion that is worth pulling out.
>
> RDF has a data model and there is also an interpretation of the data model.
>
> The data model is one spec ("Concepts and Abstract Syntax") and the
> interpretation in another ("Semantics", also commonly referred to as the
> Model Theory).  There others semantics as well (RDFS, OWL-x etc).
>
> The Model Theory only reflects unchanging graphs.  "Lean graphs" are in
> the model theory.
>
> RDF data has to be built in the first place using the Data Model.
>
> This layering should be reflected in code.
>
> The commons rdf API should reflect the Data Model; how to build graphs how
> they can be manipulated.  Systems can then be built on top of that -
> including Clerezza leaning graphs, or owl:sameAs prototype chains (which
> interact with leaning) or RDFS inference or ...
>
> Does that work for you?
>
>         Andy
>
>
> On 31/01/15 20:45, Reto Gmür wrote:
>
>> Hi Andy,
>>
>>
>> This analogy might help (or not):
>>
>>>
>>> There is a table with 4 metal spheres on it in a line across it.  Each
>>> sphere is exactly kind of material, the same mass, the same colour and
>>> the
>>> same shininess.  You can ask "is that sphere the same as that other one?"
>>> by pointing to two of them.  If you put them in a bag, shake the bag and
>>> take one out, you can't tell whether this chosen one is the same as the
>>> one
>>> that was on the right-hand end of the line.
>>>
>>>
>> How many Spheres will you be able to take out of the bag? I once made the
>> mistake (resulting in a bug) to assume identity of the indiscernible in
>> RDF, this is however not the case.
>>
>> To have the sphere exist they need to be part of graph.
>>
>>
>> If this is the graph:
>>
>> _:a p _:b.
>> _:b p _:c.
>> _:c p _:d.
>> _:d p _:e.
>> _:e p _:a.
>>
>> We can put the 5 sphere to the bag and shake as much as we want we will
>> always have 5 spheres in the bag. Of course, as you say, if we take one
>> out
>> we can say which one it is. (But we don't care, we are happy having a
>> shiny
>> meta sphere in a circle  with 4 other spheres).
>>
>> By contrast if this is the graph:
>>
>> _:a rdf:type ex:Sphere.
>> _:b rdf:type ex:Sphere.
>> _:c rdf:type ex:Sphere.
>> _:d rdf:type ex:Sphere.
>> _:e rdf:type ex:Sphere.
>>
>> When we open the bag we might just have one sphere. Which is fine, as the
>> above graph evaluates to true in any world where there is at least one
>> sphere.
>>
>>
>> So far for what RDF is concerned. Things are a bit different for the API
>> that allows creating the graph, in this situation we might actually be
>> pointing (or looking) at spheres and as long as we do so they should not
>> disappear. After having added the above 5 triples we might go on adding:
>>
>>
>> _:a rdf:type ex:Shiny.
>> _:b rdf:type ex:Heavy.
>> _:c rdf:type ex:Radiactive.
>> _:d rdf:type ex:Transparent.
>> _:e rdf:type ex:Whole.
>>
>> In this case we should have 5 spheres described by 10 triples. Every well
>> behaving quality bag will give as back all 5 spheres[*].
>>
>> In other words as long as I am looking at the spheres I might go on adding
>> things to theirs descriptions making them actually distinct spheres.
>>
>> This distinction between being looking at (or pointing at) spheres and
>> just
>> having them in the bag is very straight forwardly (and imho elegantly)
>> modeled with the distinction of an object instance being reachable or not.
>>
>> In the clerezza code and in the SVN commons proposal code along the
>> following lines will works as expected.
>>
>> {a,b,c,d,e} is a set of 5 BlankNodes (i.e. we have 5 objects, no two of
>> them are equals).
>>
>> g.add(a, RDF.type, EX.Sphere);
>> g.add(b, RDF.type, EX.Sphere);
>> g.add(c, RDF.type, EX.Sphere);
>> g.add(d, RDF.type, EX.Sphere);
>> g.add(e, RDF.type, EX.Sphere);
>> //if we save the graph here the backend might just store one triple
>> //but as long as wen can do the following
>> g.add(a, RDF.type, EX.Shiny);
>> g.add(b, RDF.type, EX.Heavy);
>> g.add(c, RDF.type, EX.Radiactive);
>> g.add(d, RDF.type, EX.Transparent);
>> g.add(e, RDF.type, EX.Whole);
>> //we will end up storing 10 triples
>>
>> In the API proposal it is neither clear in which situations the backend
>> might remove redundant information (as new blanknodes object with the same
>> id might be created with the same factory) nor if the latter 5 add
>> invocations will indeed add triples with the same subject (as the
>> implementation might have created bnodes that differ from the original 5
>> as
>> these might originate from another graph, see my answer to Stian).
>>
>> So basically I agree with everything you write in your email and don't see
>> any reason to expose the internal identifier in the API and have complex
>> identity criteria.
>>
>> Cheers,
>> Reto
>>
>>   [*] Well, yeah the problem of the metaphors: we will always get back 5
>> bnodes, even though the graph also evaluates to true in a universe with
>> just one shiny, heavy, radiactive, transparent and whole sphere.
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Andy Seaborne
On 03/02/15 15:42, Reto Gmür wrote:
> Should BNode be shareable across Graphs? The Abstract Syntax says that they
> can be shared across the graphs of the same dataset,

Yes - they can be shared.

The note about shared across the graphs of the same dataset is to
highlight an important point. It is not limiting though.

Other important cases: subgraph, composite graphs (union, intersection etc).

It follows from the definitions but because it's by the lack of any text
making restrictions, the dataset is explicitly picked up.  The
definitions build up to graphs from below ... rdf term -> triples ->
graphs -> datasets. There is no back linkage, no context, in the data model.

(data model = abstract syntax but "abstract syntax" leads to a bit of
confusion, where as people seem more comfortable with "data model"
because there are real syntaxes (Turtle, et al).

> What we don't need:
> - A mean for application to re-create identical BNodes (implementations may
> however do so), if we have a pointer to the BNode that's fine, otherwise we
> get existing BNode by accessing the triples in the graph.

A graph is not limited to one JVM (in time or space).

So an implementation may need to create a specific bNode (a database
would do this, so would transfer

It depends a bit on the "A" in API -- whether "application" means "user
application" or includes system-machinery things like parsers but the
boundary is nit easy to fix.

It might include algorithms spread over many machines, they need to
transfer abstract syntax around.

        Andy




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reto Gmür-2
On Tue, Feb 3, 2015 at 4:44 PM, Andy Seaborne <[hidden email]> wrote:

>
>
> It depends a bit on the "A" in API -- whether "application" means "user
> application" or includes system-machinery things like parsers but the
> boundary is nit easy to fix.
>

I don't see the reason to make such a distinction. What's the difference
between a piece of software that converts a file to RDF and one that
describes the music it hears in RDF, with or without involving a human.

The relevant question is if the API should only allow for the same  to work
on data provided by different implementations (so that one can e.g. switch
from jena to sesame) or if it should allow to mix data from different
implementations. Clerezza does the latter, you can have a weather service
proving a graph (not stored anywhere but computed just in time from the
data of the sensors) and a triple store copy data from one graph to the
other (well, in this case only in one direction, as currently the weather
is read only).

Cheers,
Reto