[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Handle-info] indirect handles actually implemented?



Hi Eric,

We are working on a new handle type that may someday replace the basic "URL" handle values. This new handle type will address most of the URL-based features that CNRI and others have wanted to have over the years. I think this new type is flexible enough to be useful for a long time to come while also being simple enough for others to understand and implement.

We are building the ability to select between multiple locators for an object into the handle proxy (or any handle client, really). To use this feature you will be able to add a handle value with type "10320/ loc" and data field with content such as:

<locations>
<location id=0 href="http://uk.example.com/"; country="gb" weight=0 />
<location id=1 href="http://www1.example.com/"; weight=1 />
<location id=2 href="http://www2.example.com/"; weight=1 />
</locations>

Basically, it is a simple bit of XML consisting of a number of <location> elements. The proxy will examine the <location> elements to determine which one should be sent as a redirect to the client. The proxy will use a number of different methods for selecting which URL to choose for the redirection. The methods used to choose a location can be specifiedd in the handle itself by adding a "chooseby" attribute to the <locations> element. The chooseby attribute refers to a comma-delimited list of known method names. These method names are:

locatt: Select only locations from an attribute passed in the proxy/ handle link. If someone constructs a link as http://hdl.handle.net/test/handle?locatt=id:1 then the proxy will return the locations that have an "id" attribute of 1 (ie the second location in the above example).

country: Selects only locations that have a 'country' attribute matching the country of the client (based on a lookup using geoip). If no matching locations are found then this selects locations that have no country attribute (ie. not a mismatch).

weighted: Selects a single location based on a random choice. The proxy will observe the 'weight' attribute for each location which should be a floating point non-negative number. The weighting allows for a very basic load balancing, but primarily I see it as a way to represent locations that can only be addressed directly (by country or locatt/attributes). If applied to locations that all have non- positive weights then this selects one of the remaining locations randomly (regardless of weight).

If no 'chooseby' attribute is provided at the top level then the proxy uses a sensible default which is currently "locatt,country,weighted". We assume that other methods can be added later but they will have to be backwards compatible if included in the default chooseby settings.

The proxy will evaluate each selection method in order. After each selection method is applied to the list of selections the proxy will take one of four steps:
- if there is only one remaining location element, it is returned as a redirect
- if there are no remaining location elements, the proxy reverts to the location elements as they were before the last method was applied
- if there are more than one location elements the proxy will apply the remaining selection methods to those locations
- if there are no more selection methods to try then the weighted random selection method is applied, which is guaranteed to return a single location. So the weighted random is the "fallback".



The above has all been implemented and tested and will hopefully be available in the proxies (both hdl.handle.net and dx.doi.org) in the next week or so. That doesn't mean that we don't appreciate suggestions for improvement, we do.



Now, back to the topic of indirection. What I describe below has not been implemented and is very much in a design phase.


Because we're using XML and a flexible set of locators for the method above, the "10320/loc" values seem like a good place to also take care of indirection. We've come up with a couple of options for doing the indirection, and are interested in feedback and feature requests. Basically, we expect to extend the locations XML blurb to be able to reference other location sets/URI-templates/repositories/etc. One way this could be structured is in the example handle "123/doc1" below.

hdl:123/doc1 resolves to:
<locations relative-href="mydocument.pdf" >
<repo id="hdl:123/repo1" />
<repo id="hdl:123/repo2" />
<repo id="hdl:123/repo3" />
</locations>

...meaning is that 123/doc1 is located on the given three repositories.


hdl:123/repo1 resolves to:
<locations>
<location href="http://europe.mirrors.com/mirror/"; />
<locations>
which the proxy uses to build the following URL (by combining the href with the relative-href from the original handle):
http://europe.mirrors.com/doi_mirror/mydocument.pdf


hdl:123/repo2 resolves to:
<locations>
<location openurl-base="http://asia.mirrors.com/openurl?"; />
<locations>
which the proxy converts to:
http://asia.mirrors.com/openurl?ver=xxx&id=123/doc1

hdl:123/repo3 resolves to:
<locations>
<location template-uri="http://usa.mirrors.com/docs/search? hdl={hdl}&blah=feh" />
<locations>
resulting in:
http://usa.mirrors.com/docs/search?hdl=123/doc1&blah=feh1


-------------------------

Refactoring a bit, we move the set of repositories out of each individual handle to get the following:
Even better might be this option:


hdl:123/doc1 resolves to:
<locations service="hdl:123/reposvc" relative-href="mydocument.pdf" />

hdl:123/reposvc resolves to:
<locations>
<location href="http://europe.mirrors.com/doi_mirror/"; />
<location openurl-base="http://asia.mirrors.com/openurl?"; />
<location template-base="http://usa.mirrors.com/"; />
</locations>

-------------------------


Refactoring a bit further, we can separate each individual repository to its own handle which the administrator can adjust to change the access method (ie the repository software that they use). This ability to distinguish an 'owner' of each repository that can update their own access information independently could be very useful.



hdl:123/doc1 resolves to: <locations service="hdl:123/reposvc" relative-href="mydocument.pdf" />

hdl:123/reposvc resolves to:
<locations>
<location repo="hdl:123/repo1" />
<location repo="hdl:123/repo2" />
<location repo="hdl:123/repo3" />
</locations>

... where each of the repo1/2/3 handles resolves to the same values as in the first option.

-------------------------


Granted we have to be careful about making the proxy do too much work per-request, but I think caching the repo handles would mostly cancel out any performance hit we get from having to resolve the indirection.


Congratulations if you've managed to get this far. We've love to hear your feedback on this.

Thanks,
Sean

On Mar 26, 2008, at 16:05 , Eric Auer wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi, to follow-up myself: I notice that the HS_ALIAS *is* processed by both the browser plugins and the hdl.handle.net web proxy:

Having a HS_ALIAS field with any arbitrary
index for a handle makes the system return
the data of the handle listed in the data
of the HS_ALIAS (eg "42/111")... In short,
the HS_ALIAS "blocks" access to any other
field/index of the "original" handle, and
lets you see contents  from the referred-
to handle for all your queries.

That is quite useful and somehow similar to
the indirect handles described in 1995, but
the "URL indirection" suggested in my other
mail is still useful in other scenarios. It
would be nice if the web proxy could do it.

Regards, Eric

...
I am told that the Handle System supports a
thing called "indirect handles", as seen in:
www.cnri.reston.va.us/home/cstr/handle-overview.html
(The Handle System A Technical Overview, June 1995)
...

...
For now, I did some experiments with putting a
hdl:... value into URL records of handles. This
also creates an indirection, but for example the
hdl.handle.net web proxy does not follow that,
it just returns the hdl:... URL/URI. If you have
a browser with Handle System plugin (tested and
works w/ MSIE) then it should still be possible
to follow the recursion without having to install
special client or server side software. Of course
the described "URL field indirection" only works
for the URL field, not for other data stored on
a handle server for a given handle. That is okay
for now, but redirecting an entire handle can be
interesting / useful, too.
...

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFH6nQw99dkROyhRRsRAt16AJ9xa/ywoKp2IG/T8xqLpWXzivI/0wCfTENj
fVx0gCwQJ+bQUnGhzq7Yitc=
=ffeu
-----END PGP SIGNATURE-----


_______________________________________________ Handle-Info mailing list Handle-Info@cnri.reston.va.us http://www.handle.net/mailman/listinfo/handle-info



_______________________________________________
Handle-Info mailing list
Handle-Info@cnri.reston.va.us
http://www.handle.net/mailman/listinfo/handle-info