registry module¶
registry module API:
- organizations
- nodes
- networks
- installations
- datasets
- dataset_metrics
- dataset_suggest
- dataset_search
Example usage:
from pygbif import registry
registry.dataset_metrics(uuid='3f8a1297-3259-4700-91fc-acc4170b27ce')
registry API¶
-
registry.
datasets
(type=None, uuid=None, query=None, id=None, limit=100, offset=None, **kwargs)¶ Search for datasets and dataset metadata.
Parameters: - data – [str] The type of data to get. Default:
all
- type – [str] Type of dataset, options include
OCCURRENCE
, etc. - uuid – [str] UUID of the data node provider. This must be specified if data
is anything other than
all
. - query – [str] Query term(s). Only used when
data = 'all'
- id – [int] A metadata document id.
References http://www.gbif.org/developer/registry#datasets
Usage:
from pygbif import registry registry.datasets(limit=5) registry.datasets(type="OCCURRENCE") registry.datasets(uuid="a6998220-7e3a-485d-9cd6-73076bd85657") registry.datasets(data='contact', uuid="a6998220-7e3a-485d-9cd6-73076bd85657") registry.datasets(data='metadata', uuid="a6998220-7e3a-485d-9cd6-73076bd85657") registry.datasets(data='metadata', uuid="a6998220-7e3a-485d-9cd6-73076bd85657", id=598) registry.datasets(data=['deleted','duplicate']) registry.datasets(data=['deleted','duplicate'], limit=1)
- data – [str] The type of data to get. Default:
-
registry.
dataset_metrics
()¶ Get details on a GBIF dataset.
Parameters: uuid – [str] One or more dataset UUIDs. See examples. References: http://www.gbif.org/developer/registry#datasetMetrics
Usage:
from pygbif import registry registry.dataset_metrics(uuid='3f8a1297-3259-4700-91fc-acc4170b27ce') registry.dataset_metrics(uuid='66dd0960-2d7d-46ee-a491-87b9adcfe7b1') registry.dataset_metrics(uuid=['3f8a1297-3259-4700-91fc-acc4170b27ce', '66dd0960-2d7d-46ee-a491-87b9adcfe7b1'])
-
registry.
dataset_suggest
(type=None, keyword=None, owningOrg=None, publishingOrg=None, hostingOrg=None, publishingCountry=None, decade=None, limit=100, offset=None, **kwargs)¶ Search that returns up to 20 matching datasets. Results are ordered by relevance.
Parameters: - q – [str] Query term(s) for full text search. The value for this parameter can be a simple word or a phrase. Wildcards can be added to the simple word parameters only, e.g.
q=*puma*
- type – [str] Type of dataset, options include OCCURRENCE, etc.
- keyword – [str] Keyword to search by. Datasets can be tagged by keywords, which you can search on. The search is done on the merged collection of tags, the dataset keywordCollections and temporalCoverages. SEEMS TO NOT BE WORKING ANYMORE AS OF 2016-09-02.
- owningOrg – [str] Owning organization. A uuid string. See
organizations()
- publishingOrg – [str] Publishing organization. A uuid string. See
organizations()
- hostingOrg – [str] Hosting organization. A uuid string. See
organizations()
- publishingCountry – [str] Publishing country.
- decade – [str] Decade, e.g., 1980. Filters datasets by their temporal coverage broken down to decades. Decades are given as a full year, e.g. 1880, 1960, 2000, etc, and will return datasets wholly contained in the decade as well as those that cover the entire decade or more. Facet by decade to get the break down, e.g.
/search?facet=DECADE&facet_only=true
(see example below) - limit – [int] Number of results to return. Default:
300
- offset – [int] Record to start at. Default:
0
Returns: A dictionary
References: http://www.gbif.org/developer/registry#datasetSearch
Usage:
from pygbif import registry registry.dataset_suggest(q="Amazon", type="OCCURRENCE") # Suggest datasets tagged with keyword "france". registry.dataset_suggest(keyword="france") # Suggest datasets owned by the organization with key # "07f617d0-c688-11d8-bf62-b8a03c50a862" (UK NBN). registry.dataset_suggest(owningOrg="07f617d0-c688-11d8-bf62-b8a03c50a862") # Fulltext search for all datasets having the word "amsterdam" somewhere in # its metadata (title, description, etc). registry.dataset_suggest(q="amsterdam") # Limited search registry.dataset_suggest(type="OCCURRENCE", limit=2) registry.dataset_suggest(type="OCCURRENCE", limit=2, offset=10) # Return just descriptions registry.dataset_suggest(type="OCCURRENCE", limit = 5, description=True) # Search by decade registry.dataset_suggest(decade=1980, limit = 30)
- q – [str] Query term(s) for full text search. The value for this parameter can be a simple word or a phrase. Wildcards can be added to the simple word parameters only, e.g.
-
registry.
dataset_search
(type=None, keyword=None, owningOrg=None, publishingOrg=None, hostingOrg=None, decade=None, publishingCountry=None, facet=None, facetMincount=None, facetMultiselect=None, hl=False, limit=100, offset=None, **kwargs)¶ Full text search across all datasets. Results are ordered by relevance.
Parameters: - q – [str] Query term(s) for full text search. The value for this parameter
can be a simple word or a phrase. Wildcards can be added to the simple word
parameters only, e.g.
q=*puma*
- type – [str] Type of dataset, options include OCCURRENCE, etc.
- keyword – [str] Keyword to search by. Datasets can be tagged by keywords, which you can search on. The search is done on the merged collection of tags, the dataset keywordCollections and temporalCoverages. SEEMS TO NOT BE WORKING ANYMORE AS OF 2016-09-02.
- owningOrg – [str] Owning organization. A uuid string. See
organizations()
- publishingOrg – [str] Publishing organization. A uuid string. See
organizations()
- hostingOrg – [str] Hosting organization. A uuid string. See
organizations()
- publishingCountry – [str] Publishing country.
- decade – [str] Decade, e.g., 1980. Filters datasets by their temporal coverage
broken down to decades. Decades are given as a full year, e.g. 1880, 1960, 2000,
etc, and will return datasets wholly contained in the decade as well as those
that cover the entire decade or more. Facet by decade to get the break down,
e.g.
/search?facet=DECADE&facet_only=true
(see example below) - facet – [str] A list of facet names used to retrieve the 100 most frequent values for a field. Allowed facets are: type, keyword, publishingOrg, hostingOrg, decade, and publishingCountry. Additionally subtype and country are legal values but not yet implemented, so data will not yet be returned for them.
- facetMincount – [str] Used in combination with the facet parameter. Set facetMincount={#} to exclude facets with a count less than {#}, e.g. http://api.gbif.org/v1/dataset/search?facet=type&limit=0&facetMincount=10000 only shows the type value ‘OCCURRENCE’ because ‘CHECKLIST’ and ‘METADATA’ have counts less than 10000.
- facetMultiselect – [bool] Used in combination with the facet parameter. Set
facetMultiselect=True
to still return counts for values that are not currently filtered, e.g. http://api.gbif.org/v1/dataset/search?facet=type&limit=0&type=CHECKLIST&facetMultiselect=true still shows type values ‘OCCURRENCE’ and ‘METADATA’ even though type is being filtered bytype=CHECKLIST
- hl – [bool] Set
hl=True
to highlight terms matching the query when in fulltext search fields. The highlight will be an emphasis tag of class ‘gbifH1’ e.g. http://api.gbif.org/v1/dataset/search?q=plant&hl=true Fulltext search fields include: title, keyword, country, publishing country, publishing organization title, hosting organization title, and description. One additional full text field is searched which includes information from metadata documents, but the text of this field is not returned in the response. - limit – [int] Number of results to return. Default:
300
- offset – [int] Record to start at. Default:
0
Note: Note that you can pass in additional faceting parameters on a per field basis. For example, if you want to limit the numbef of facets returned from a field
foo
to 3 results, pass infoo_facetLimit = 3
. GBIF does not allow all per field parameters, but does allow some. See also examples.Returns: A dictionary
References: http://www.gbif.org/developer/registry#datasetSearch
Usage:
from pygbif import registry # Gets all datasets of type "OCCURRENCE". registry.dataset_search(type="OCCURRENCE", limit = 10) # Fulltext search for all datasets having the word "amsterdam" somewhere in # its metadata (title, description, etc). registry.dataset_search(q="amsterdam", limit = 10) # Limited search registry.dataset_search(type="OCCURRENCE", limit=2) registry.dataset_search(type="OCCURRENCE", limit=2, offset=10) # Search by decade registry.dataset_search(decade=1980, limit = 10) # Faceting ## just facets registry.dataset_search(facet="decade", facetMincount=10, limit=0) ## data and facets registry.dataset_search(facet="decade", facetMincount=10, limit=2) ## many facet variables registry.dataset_search(facet=["decade", "type"], facetMincount=10, limit=0) ## facet vars ### per variable paging x = registry.dataset_search( facet = ["decade", "type"], decade_facetLimit = 3, type_facetLimit = 3, limit = 0 ) ## highlight x = registry.dataset_search(q="plant", hl=True, limit = 10) [ z['description'] for z in x['results'] ]
- q – [str] Query term(s) for full text search. The value for this parameter
can be a simple word or a phrase. Wildcards can be added to the simple word
parameters only, e.g.
-
registry.
installations
(uuid=None, q=None, identifier=None, identifierType=None, limit=100, offset=None, **kwargs)¶ Installations metadata.
Parameters: - data – [str] The type of data to get. Default is all data. If not
all
, then one or more ofcontact
,endpoint
,dataset
,comment
,deleted
,nonPublishing
. - uuid – [str] UUID of the data node provider. This must be specified if data
is anything other than
all
. - q – [str] Query nodes. Only used when
data='all'
. Ignored otherwise. - identifier – [fixnum] The value for this parameter can be a simple string or integer, e.g. identifier=120
- identifierType – [str] Used in combination with the identifier parameter to filter
identifiers by identifier type:
DOI
,FTP
,GBIF_NODE
,GBIF_PARTICIPANT
,GBIF_PORTAL
,HANDLER
,LSID
,UNKNOWN
,URI
,URL
,UUID
- limit – [int] Number of results to return. Default:
100
- offset – [int] Record to start at. Default:
0
Returns: A dictionary
References: http://www.gbif.org/developer/registry#installations
Usage:
from pygbif import registry registry.installations(limit=5) registry.installations(q="france") registry.installations(uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4") registry.installations(data='contact', uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4") registry.installations(data='contact', uuid="2e029a0c-87af-42e6-87d7-f38a50b78201") registry.installations(data='endpoint', uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4") registry.installations(data='dataset', uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4") registry.installations(data='deleted') registry.installations(data='deleted', limit=2) registry.installations(data=['deleted','nonPublishing'], limit=2) registry.installations(identifierType='DOI', limit=2)
- data – [str] The type of data to get. Default is all data. If not
-
registry.
networks
(uuid=None, q=None, identifier=None, identifierType=None, limit=100, offset=None, **kwargs)¶ Networks metadata.
Note: there’s only 1 network now, so there’s not a lot you can do with this method.
Parameters: - data – [str] The type of data to get. Default:
all
- uuid – [str] UUID of the data network provider. This must be specified if data
is anything other than
all
. - q – [str] Query networks. Only used when
data = 'all'
. Ignored otherwise. - identifier – [fixnum] The value for this parameter can be a simple string or integer, e.g. identifier=120
- identifierType – [str] Used in combination with the identifier parameter to filter
identifiers by identifier type:
DOI
,FTP
,GBIF_NODE
,GBIF_PARTICIPANT
,GBIF_PORTAL
,HANDLER
,LSID
,UNKNOWN
,URI
,URL
,UUID
- limit – [int] Number of results to return. Default:
100
- offset – [int] Record to start at. Default:
0
Returns: A dictionary
References: http://www.gbif.org/developer/registry#networks
Usage:
from pygbif import registry registry.networks(limit=1) registry.networks(uuid='2b7c7b4f-4d4f-40d3-94de-c28b6fa054a6')
- data – [str] The type of data to get. Default:
-
registry.
nodes
(uuid=None, q=None, identifier=None, identifierType=None, limit=100, offset=None, isocode=None, **kwargs)¶ Nodes metadata.
Parameters: - data – [str] The type of data to get. Default:
all
- uuid – [str] UUID of the data node provider. This must be specified if data
is anything other than
all
. - q – [str] Query nodes. Only used when
data = 'all'
- identifier – [fixnum] The value for this parameter can be a simple string or integer, e.g. identifier=120
- identifierType – [str] Used in combination with the identifier parameter to filter
identifiers by identifier type:
DOI
,FTP
,GBIF_NODE
,GBIF_PARTICIPANT
,GBIF_PORTAL
,HANDLER
,LSID
,UNKNOWN
,URI
,URL
,UUID
- limit – [int] Number of results to return. Default:
100
- offset – [int] Record to start at. Default:
0
- isocode – [str] A 2 letter country code. Only used if
data = 'country'
.
Returns: A dictionary
References http://www.gbif.org/developer/registry#nodes
Usage:
from pygbif import registry registry.nodes(limit=5) registry.nodes(identifier=120) registry.nodes(uuid="1193638d-32d1-43f0-a855-8727c94299d8") registry.nodes(data='identifier', uuid="03e816b3-8f58-49ae-bc12-4e18b358d6d9") registry.nodes(data=['identifier','organization','comment'], uuid="03e816b3-8f58-49ae-bc12-4e18b358d6d9") uuids = ["8cb55387-7802-40e8-86d6-d357a583c596","02c40d2a-1cba-4633-90b7-e36e5e97aba8", "7a17efec-0a6a-424c-b743-f715852c3c1f","b797ce0f-47e6-4231-b048-6b62ca3b0f55", "1193638d-32d1-43f0-a855-8727c94299d8","d3499f89-5bc0-4454-8cdb-60bead228a6d", "cdc9736d-5ff7-4ece-9959-3c744360cdb3","a8b16421-d80b-4ef3-8f22-098b01a89255", "8df8d012-8e64-4c8a-886e-521a3bdfa623","b35cf8f1-748d-467a-adca-4f9170f20a4e", "03e816b3-8f58-49ae-bc12-4e18b358d6d9","073d1223-70b1-4433-bb21-dd70afe3053b", "07dfe2f9-5116-4922-9a8a-3e0912276a72","086f5148-c0a8-469b-84cc-cce5342f9242", "0909d601-bda2-42df-9e63-a6d51847ebce","0e0181bf-9c78-4676-bdc3-54765e661bb8", "109aea14-c252-4a85-96e2-f5f4d5d088f4","169eb292-376b-4cc6-8e31-9c2c432de0ad", "1e789bc9-79fc-4e60-a49e-89dfc45a7188","1f94b3ca-9345-4d65-afe2-4bace93aa0fe"] [ registry.nodes(data='identifier', uuid=x) for x in uuids ]
- data – [str] The type of data to get. Default:
-
registry.
organizations
(uuid=None, q=None, identifier=None, identifierType=None, limit=100, offset=None, **kwargs)¶ Organizations metadata.
Parameters: - data – [str] The type of data to get. Default is all data. If not
all
, then one or more ofcontact
,endpoint
,identifier
,tag
,machineTag
,comment
,hostedDataset
,ownedDataset
,deleted
,pending
,nonPublishing
. - uuid – [str] UUID of the data node provider. This must be specified if data
is anything other than
all
. - q – [str] Query nodes. Only used when
data='all'
. Ignored otherwise. - identifier – [fixnum] The value for this parameter can be a simple string or integer, e.g. identifier=120
- identifierType – [str] Used in combination with the identifier parameter to filter
identifiers by identifier type:
DOI
,FTP
,GBIF_NODE
,GBIF_PARTICIPANT
,GBIF_PORTAL
,HANDLER
,LSID
,UNKNOWN
,URI
,URL
,UUID
- limit – [int] Number of results to return. Default:
100
- offset – [int] Record to start at. Default:
0
Returns: A dictionary
References: http://www.gbif.org/developer/registry#organizations
Usage:
from pygbif import registry registry.organizations(limit=5) registry.organizations(q="france") registry.organizations(identifier=120) registry.organizations(uuid="e2e717bf-551a-4917-bdc9-4fa0f342c530") registry.organizations(data='contact', uuid="e2e717bf-551a-4917-bdc9-4fa0f342c530") registry.organizations(data='deleted') registry.organizations(data='deleted', limit=2) registry.organizations(data=['deleted','nonPublishing'], limit=2) registry.organizations(identifierType='DOI', limit=2)
- data – [str] The type of data to get. Default is all data. If not