occurrence module

occurrence module API:

search
get
get_verbatim
get_fragment
count
count_basisofrecord
count_year
count_datasets
count_countries
count_schema
count_publishingcountries
download
download_meta
download_list
download_get
download_sql
download_describe
download_citation

Example usage:

from pygbif import occurrences as occ
occ.search(taxonKey = 3329049)
occ.get(key = 1986559641)
occ.count(isGeoreferenced = True)
occ.download('basisOfRecord = PRESERVED_SPECIMEN')
occ.download('taxonKey = 3119195')
occ.download('decimalLatitude > 50')
occ.download_list(user = "sckott", limit = 5)
occ.download_meta(key = "0000099-140929101555934")
occ.download_get("0000066-140928181241064")
occ.download_sql("SELECT datasetKey, countryCode, COUNT(*) FROM occurrence WHERE continent = 'EUROPE' GROUP BY datasetKey, countryCode")
occ.download_describe("simpleCsv")
occ.download_citation("0002526-241107131044228")

Note

Download endpoints require GBIF credentials. Set them as environment variables:

export GBIF_USER="your_gbif_username"
export GBIF_PWD="your_gbif_password"

You can also pass credentials directly via user= and pwd= arguments.

occurrences API

occurrences.search(repatriated=None, kingdomKey=None, phylumKey=None, classKey=None, orderKey=None, familyKey=None, genusKey=None, subgenusKey=None, scientificName=None, country=None, publishingCountry=None, hasCoordinate=None, typeStatus=None, recordNumber=None, lastInterpreted=None, continent=None, geometry=None, recordedBy=None, recordedByID=None, identifiedByID=None, basisOfRecord=None, datasetKey=None, eventDate=None, catalogNumber=None, year=None, month=None, decimalLatitude=None, decimalLongitude=None, elevation=None, depth=None, institutionCode=None, collectionCode=None, hasGeospatialIssue=None, issue=None, q=None, spellCheck=None, mediatype=None, limit=300, offset=0, establishmentMeans=None, facet=None, facetMincount=None, facetMultiselect=None, **kwargs)

Search GBIF occurrences

Parameters:

taxonKey – [int] A GBIF occurrence identifier
q – [str] Simple search parameter. The value for this parameter can be a simple word or a phrase.
spellCheck – [bool] If True ask GBIF to check your spelling of the value passed to the search parameter. IMPORTANT: This only checks the input to the search parameter, and no others. Default: False
repatriated – [str] Searches for records whose publishing country is different to the country where the record was recorded in
kingdomKey – [int] Kingdom classification key
phylumKey – [int] Phylum classification key
classKey – [int] Class classification key
orderKey – [int] Order classification key
familyKey – [int] Family classification key
genusKey – [int] Genus classification key
subgenusKey – [int] Subgenus classification key
scientificName – [str] A scientific name from the GBIF backbone. All included and synonym taxa are included in the search.
datasetKey – [str] The occurrence dataset key (a uuid)
catalogNumber – [str] An identifier of any form assigned by the source within a physical collection or digital dataset for the record which may not unique, but should be fairly unique in combination with the institution and collection code.
recordedBy – [str] The person who recorded the occurrence.
recordedByID – [str] Identifier (e.g. ORCID) for the person who recorded the occurrence
identifiedByID – [str] Identifier (e.g. ORCID) for the person who provided the taxonomic identification of the occurrence.
collectionCode – [str] An identifier of any form assigned by the source to identify the physical collection or digital dataset uniquely within the text of an institution.
institutionCode – [str] An identifier of any form assigned by the source to identify the institution the record belongs to. Not guaranteed to be que.
country – [str] The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded. See here http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
basisOfRecord –
[str] Basis of record, as defined in our BasisOfRecord enum here http://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/BasisOfRecord.html Acceptable values are:
- FOSSIL_SPECIMEN An occurrence record describing a fossilized specimen.
- HUMAN_OBSERVATION An occurrence record describing an observation made by one or more people.
- LIVING_SPECIMEN An occurrence record describing a living specimen.
- MACHINE_OBSERVATION An occurrence record describing an observation made by a machine.
- MATERIAL_CITATION An occurrence record based on a reference to a scholarly publication.
- OBSERVATION An occurrence record describing an observation.
- OCCURRENCE An existence of an organism at a particular place and time. No more specific basis.
- PRESERVED_SPECIMEN An occurrence record describing a preserved specimen.
eventDate – [date] Occurrence date in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries, smaller,larger (e.g., 1990,1991, whereas 1991,1990 wouldn’t work)
year – [int] The 4 digit year. A year of 98 will be interpreted as AD 98. Supports range queries, smaller,larger (e.g., 1990,1991, whereas 1991,1990 wouldn’t work)
month – [int] The month of the year, starting with 1 for January. Supports range queries, smaller,larger (e.g., 1,2, whereas 2,1 wouldn’t work)
decimalLatitude – [float] Latitude in decimals between -90 and 90 based on WGS 84. Supports range queries, smaller,larger (e.g., 25,30, whereas 30,25 wouldn’t work)
decimalLongitude – [float] Longitude in decimals between -180 and 180 based on WGS 84. Supports range queries (e.g., -0.4,-0.2, whereas -0.2,-0.4 wouldn’t work).
publishingCountry – [str] The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded.
elevation – [int/str] Elevation in meters above sea level. Supports range queries, smaller,larger (e.g., 5,30, whereas 30,5 wouldn’t work)
depth – [int/str] Depth in meters relative to elevation. For example 10 meters below a lake surface with given elevation. Supports range queries, smaller,larger (e.g., 5,30, whereas 30,5 wouldn’t work)
geometry – [str] Searches for occurrences inside a polygon described in Well Known Text (WKT) format. A WKT shape written as either POINT, LINESTRING, LINEARRING POLYGON, or MULTIPOLYGON. Example of a polygon: ((30.1 10.1, 20, 20 40, 40 40, 30.1 10.1)) would be queried as http://bit.ly/1BzNwDq. Polygons must have counter-clockwise ordering of points.
hasGeospatialIssue – [bool] Includes/excludes occurrence records which contain spatial issues (as determined in our record interpretation), i.e. hasGeospatialIssue=TRUE returns only those records with spatial issues while hasGeospatialIssue=FALSE includes only records without spatial issues. The absence of this parameter returns any record with or without spatial issues.
issue – [str] One or more of many possible issues with each occurrence record. See Details. Issues passed to this parameter filter results by the issue.
hasCoordinate – [bool] Return only occurence records with lat/long data (True) or all records (False, default).
typeStatus – [str] Type status of the specimen. One of many options. See ?typestatus
recordNumber – [int] Number recorded by collector of the data, different from GBIF record number. See http://rs.tdwg.org/dwc/terms/#recordNumber} for more info
lastInterpreted – [date] Date the record was last modified in GBIF, in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries, smaller,larger (e.g., 1990,1991, whereas 1991,1990 wouldn’t work)
continent – [str] Continent. One of africa, antarctica, asia, europe, north_america (North America includes the Caribbean and reachies down and includes Panama), oceania, or south_america
fields – [str] Default (all) returns all fields. minimal returns just taxon name, key, latitude, and longitude. Or specify each field you want returned by name, e.g. fields = ['name','latitude','elevation'].
mediatype – [str] Media type. Default is NULL, so no filtering on mediatype. Options: NULL, MovingImage, Sound, and StillImage
limit – [int] Number of results to return. Default: 300
offset – [int] Record to start at. Default: 0
facet – [str] a character vector of length 1 or greater
establishmentMeans – [str] EstablishmentMeans, possible values include: INTRODUCED, INVASIVE, MANAGED, NATIVE, NATURALISED, UNCERTAIN
facetMincount – [int] minimum number of records to be included in the faceting results
facetMultiselect – [bool] Set to True to still return counts for values that are not currently filtered. See examples. Default: False

Returns:

A dictionary

Usage:

from pygbif import occurrences
occurrences.search(taxonKey = 3329049)

# Return 2 results, this is the default by the way
occurrences.search(taxonKey=3329049, limit=2)

# Instead of getting a taxon key first, you can search for a name directly
# However, note that using this approach (with `scientificName="..."`)
# you are getting synonyms too. The results for using `scientifcName` and
# `taxonKey` parameters are the same in this case, but I wouldn't be surprised if for some
# names they return different results
occurrences.search(scientificName = 'Ursus americanus')
from pygbif import species
key = species.name_backbone(name = 'Ursus americanus', rank='species')['usageKey']
occurrences.search(taxonKey = key)

# Search by dataset key
occurrences.search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a', limit=20)

# Search by catalog number
occurrences.search(catalogNumber="49366", limit=20)
# occurrences.search(catalogNumber=["49366","Bird.27847588"], limit=20)

# Use paging parameters (limit and offset) to page. Note the different results
# for the two queries below.
occurrences.search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a', offset=10, limit=5)
occurrences.search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a', offset=20, limit=5)

# Many dataset keys
# occurrences.search(datasetKey=["50c9509d-22c7-4a22-a47d-8c48425ef4a7", "7b5d6a48-f762-11e1-a439-00145eb45e9a"], limit=20)

# Search by collector name
res = occurrences.search(recordedBy="smith", limit=20)
[ x['recordedBy'] for x in res['results'] ]

# Many collector names
# occurrences.search(recordedBy=["smith","BJ Stacey"], limit=20)

# recordedByID
occurrences.search(recordedByID="https://orcid.org/0000-0003-1691-239X", limit = 3)

# identifiedByID
occurrences.search(identifiedByID="https://orcid.org/0000-0003-1691-239X", limit = 3)

# Search for many species
splist = ['Cyanocitta stelleri', 'Junco hyemalis', 'Aix sponsa']
keys = [ species.name_suggest(x)[0]['key'] for x in splist ]
out = [ occurrences.search(taxonKey = x, limit=1) for x in keys ]
[ x['results'][0]['speciesKey'] for x in out ]

# Search - q parameter
occurrences.search(q = "kingfisher", limit=20)
## spell check - only works with the `search` parameter
### spelled correctly - same result as above call
occurrences.search(q = "kingfisher", limit=20, spellCheck = True)
### spelled incorrectly - stops with suggested spelling
occurrences.search(q = "kajsdkla", limit=20, spellCheck = True)
### spelled incorrectly - stops with many suggested spellings
###   and number of results for each
occurrences.search(q = "helir", limit=20, spellCheck = True)

# Search on latitidue and longitude
occurrences.search(decimalLatitude=50, decimalLongitude=10, limit=2)

# Search on a bounding box
## in well known text format
occurrences.search(geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))', limit=20)
from pygbif import species
key = species.name_suggest(q='Aesculus hippocastanum')[0]['key']
occurrences.search(taxonKey=key, geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))', limit=20)
## multipolygon
wkt = 'MULTIPOLYGON(((-123 38, -123 43, -116 43, -116 38, -123 38)),((-97 41, -97 45, -93 45, -93 41, -97 41)))'
occurrences.search(geometry = wkt, limit = 20)

# Search on country
occurrences.search(country='US', limit=20)
occurrences.search(country='FR', limit=20)
occurrences.search(country='DE', limit=20)

# Get only occurrences with lat/long data
occurrences.search(taxonKey=key, hasCoordinate=True, limit=20)

# Get only occurrences that were recorded as living specimens
occurrences.search(taxonKey=key, basisOfRecord="LIVING_SPECIMEN", hasCoordinate=True, limit=20)

# Get occurrences for a particular eventDate
occurrences.search(taxonKey=key, eventDate="2013", limit=20)
occurrences.search(taxonKey=key, year="2013", limit=20)
occurrences.search(taxonKey=key, month="6", limit=20)

# Get occurrences based on depth
key = species.name_backbone(name='Salmo salar', kingdom='animals')['usageKey']
occurrences.search(taxonKey=key, depth="5", limit=20)

# Get occurrences based on elevation
key = species.name_backbone(name='Puma concolor', kingdom='animals')['usageKey']
occurrences.search(taxonKey=key, elevation=50, hasCoordinate=True, limit=20)

# Get occurrences based on institutionCode
occurrences.search(institutionCode="TLMF", limit=20)

# Get occurrences based on collectionCode
occurrences.search(collectionCode="Floristic Databases MV - Higher Plants", limit=20)

# Get only those occurrences with spatial issues
occurrences.search(taxonKey=key, hasGeospatialIssue=True, limit=20)

# Search using a query string
occurrences.search(q="kingfisher", limit=20)

# Range queries
## See Detail for parameters that support range queries
### this is a range depth, with lower/upper limits in character string
occurrences.search(depth='50,100')

## Range search with year
occurrences.search(year='1999,2000', limit=20)

## Range search with latitude
occurrences.search(decimalLatitude='29.59,29.6')

# Search by specimen type status
## Look for possible values of the typeStatus parameter looking at the typestatus dataset
occurrences.search(typeStatus = 'allotype')

# Search by specimen record number
## This is the record number of the person/group that submitted the data, not GBIF's numbers
## You can see that many different groups have record number 1, so not super helpful
occurrences.search(recordNumber = 1)

# Search by last time interpreted: Date the record was last modified in GBIF
## The lastInterpreted parameter accepts ISO 8601 format dates, including
## yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Range queries are accepted for lastInterpreted
occurrences.search(lastInterpreted = '2014-04-01')

# Search by continent
## One of africa, antarctica, asia, europe, north_america, oceania, or south_america
occurrences.search(continent = 'south_america')
occurrences.search(continent = 'africa')
occurrences.search(continent = 'oceania')
occurrences.search(continent = 'antarctica')

# Search for occurrences with images
occurrences.search(mediatype = 'StillImage')
occurrences.search(mediatype = 'MovingImage')
x = occurrences.search(mediatype = 'Sound')
[z['media'] for z in x['results']]

# Query based on issues
occurrences.search(taxonKey=1, issue='DEPTH_UNLIKELY')
occurrences.search(taxonKey=1, issue=['DEPTH_UNLIKELY','COORDINATE_ROUNDED'])
# Show all records in the Arizona State Lichen Collection that cant be matched to the GBIF
# backbone properly:
occurrences.search(datasetKey='84c0e1a0-f762-11e1-a439-00145eb45e9a', issue=['TAXON_MATCH_NONE','TAXON_MATCH_HIGHERRANK'])

# If you pass in an invalid polygon you get hopefully informative errors
### the WKT string is fine, but GBIF says bad polygon
wkt = 'POLYGON((-178.59375 64.83258989321493,-165.9375 59.24622380205539,
-147.3046875 59.065977905449806,-130.78125 51.04484764446178,-125.859375 36.70806354647625,
-112.1484375 23.367471303759686,-105.1171875 16.093320185359257,-86.8359375 9.23767076398516,
-82.96875 2.9485268155066175,-82.6171875 -14.812060061226388,-74.8828125 -18.849111862023985,
-77.34375 -47.661687803329166,-84.375 -49.975955187343295,174.7265625 -50.649460483096114,
179.296875 -42.19189902447192,-176.8359375 -35.634976650677295,176.8359375 -31.835565983656227,
163.4765625 -6.528187613695323,152.578125 1.894796132058301,135.703125 4.702353722559447,
127.96875 15.077427674847987,127.96875 23.689804541429606,139.921875 32.06861069132688,
149.4140625 42.65416193033991,159.2578125 48.3160811030533,168.3984375 57.019804336633165,
178.2421875 59.95776046458139,-179.6484375 61.16708631440347,-178.59375 64.83258989321493))'
occurrences.search(geometry = wkt)

# Faceting
## return no occurrence records with limit=0
x = occurrences.search(facet = "country", limit = 0)
x['facets']

## also return occurrence records
x = occurrences.search(facet = "establishmentMeans", limit = 10)
x['facets']
x['results']

## multiple facet variables
x = occurrences.search(facet = ["country", "basisOfRecord"], limit = 10)
x['results']
x['facets']
x['facets']['country']
x['facets']['basisOfRecord']
x['facets']['basisOfRecord']['count']

## set a minimum facet count
x = occurrences.search(facet = "country", facetMincount = 30000000L, limit = 0)
x['facets']

## paging per each faceted variable
### do so by passing in variables like "country" + "_facetLimit" = "country_facetLimit"
### or "country" + "_facetOffset" = "country_facetOffset"
x = occurrences.search(
  facet = ["country", "basisOfRecord", "hasCoordinate"],
  country_facetLimit = 3,
  basisOfRecord_facetLimit = 6,
  limit = 0
)
x['facets']

# requests package options
## There's an acceptable set of requests options (['timeout', 'cookies', 'auth',
## 'allow_redirects', 'proxies', 'verify', 'stream', 'cert']) you can pass
## in via **kwargs, e.g., set a timeout
x = occurrences.search(timeout = 1)

occurrences.get(**kwargs)

Gets details for a single, interpreted occurrence

Parameters:: key – [int] A GBIF occurrence key
Returns:: A dictionary, of results

Usage:

from pygbif import occurrences
occurrences.get(key = 1258202889)
occurrences.get(key = 1227768771)
occurrences.get(key = 1227769518)

occurrences.get_verbatim(**kwargs)

Gets a verbatim occurrence record without any interpretation

Parameters:: key – [int] A GBIF occurrence key
Returns:: A dictionary, of results

Usage:

from pygbif import occurrences
occurrences.get_verbatim(key = 1258202889)
occurrences.get_verbatim(key = 1227768771)
occurrences.get_verbatim(key = 1227769518)

occurrences.get_fragment(**kwargs)

Get a single occurrence fragment in its raw form (xml or json)

Parameters:: key – [int] A GBIF occurrence key
Returns:: A dictionary, of results

Usage:

from pygbif import occurrences
occurrences.get_fragment(key = 1052909293)
occurrences.get_fragment(key = 1227768771)
occurrences.get_fragment(key = 1227769518)

occurrences.count(basisOfRecord=None, country=None, isGeoreferenced=None, datasetKey=None, publishingCountry=None, typeStatus=None, issue=None, year=None, **kwargs)

Returns occurrence counts for a predefined set of dimensions

For all parameters below, only one value allowed per function call. See search() for passing more than one value per parameter.

Parameters:

taxonKey – [int] A GBIF occurrence identifier
basisOfRecord – [str] A GBIF occurrence identifier
country – [str] A GBIF occurrence identifier
isGeoreferenced – [bool] A GBIF occurrence identifier
datasetKey – [str] A GBIF occurrence identifier
publishingCountry – [str] A GBIF occurrence identifier
typeStatus – [str] A GBIF occurrence identifier
issue – [str] A GBIF occurrence identifier
year – [int] A GBIF occurrence identifier

Returns:

dict

Usage:

from pygbif import occurrences
occurrences.count(taxonKey = 3329049)
occurrences.count(country = 'CA')
occurrences.count(isGeoreferenced = True)
occurrences.count(basisOfRecord = 'OBSERVATION')

occurrences.count_basisofrecord()

Lists occurrence counts by basis of record.

Returns:: dict

Usage:

from pygbif import occurrences
occurrences.count_basisofrecord()

occurrences.count_year(**kwargs)

Lists occurrence counts by year

Parameters:: year – [int] year range, e.g., 1990,2000. Does not support ranges like asterisk,2010
Returns:: dict

Usage:

from pygbif import occurrences
occurrences.count_year(year = '1990,2000')

occurrences.count_datasets(country=None, **kwargs)

Lists occurrence counts for datasets that cover a given taxon or country

Parameters:

taxonKey – [int] Taxon key
country – [str] A country, two letter code

Returns:

dict

Usage:

from pygbif import occurrences
occurrences.count_datasets(country = "DE")

occurrences.count_countries(**kwargs)

Lists occurrence counts for all countries covered by the data published by the given country

Parameters:: publishingCountry – [str] A two letter country code
Returns:: dict

Usage:

from pygbif import occurrences
occurrences.count_countries(publishingCountry = "DE")

occurrences.count_schema()

List the supported metrics by the service

Returns:: dict

Usage:

from pygbif import occurrences
occurrences.count_schema()

occurrences.count_publishingcountries(**kwargs)

Lists occurrence counts for all countries that publish data about the given country

Parameters:: country – [str] A country, two letter code
Returns:: dict

Usage:

from pygbif import occurrences
occurrences.count_publishingcountries(country = "DE")

occurrences.download(format='SIMPLE_CSV', user=None, pwd=None, email=None, pred_type='and')

Spin up a download request for GBIF occurrence data.

Parameters:

queries (str, list or dictionary) – One or more of query arguments to kick of a download job. See Details.
format – (character) One of the GBIF accepted download formats https://techdocs.gbif.org/en/openapi/v1/occurrence#/Occurrence%20download%20formats
pred_type – (character) One of equals (=), and (&), or` (|), lessThan (<), lessThanOrEquals (<=), greaterThan (>), greaterThanOrEquals (>=), in, within, not (!), like
user – (character) User name within GBIF’s website. Required. Set in your env vars with the option GBIF_USER
pwd – (character) User password within GBIF’s website. Required. Set in your env vars with the option GBIF_PWD
email – (character) Email address to receive download notice done email. Required. Set in your env vars with the option GBIF_EMAIL

Argument passed have to be passed as characters (e.g., country = US), with a space between key (country), operator (=), and value (US). See the type parameter for possible options for the operator. This character string is parsed internally.

Acceptable arguments to ... (args) are:

taxonKey = TAXON_KEY

scientificName = SCIENTIFIC_NAME

country = COUNTRY

publishingCountry = PUBLISHING_COUNTRY

hasCoordinate = HAS_COORDINATE

hasGeospatialIssue = HAS_GEOSPATIAL_ISSUE

typeStatus = TYPE_STATUS

recordNumber = RECORD_NUMBER

lastInterpreted = LAST_INTERPRETED

continent = CONTINENT

geometry = GEOMETRY

basisOfRecord = BASIS_OF_RECORD

datasetKey = DATASET_KEY

eventDate = EVENT_DATE

catalogNumber = CATALOG_NUMBER

year = YEAR

month = MONTH

decimalLatitude = DECIMAL_LATITUDE

decimalLongitude = DECIMAL_LONGITUDE

elevation = ELEVATION

depth = DEPTH

institutionCode = INSTITUTION_CODE

collectionCode = COLLECTION_CODE

issue = ISSUE

mediatype = MEDIA_TYPE

recordedBy = RECORDED_BY

repatriated = REPATRIATED

classKey = CLASS_KEY

coordinateUncertaintyInMeters = COORDINATE_UNCERTAINTY_IN_METERS

crawlId = CRAWL_ID

datasetId = DATASET_ID

datasetName = DATASET_NAME

distanceFromCentroidInMeters = DISTANCE_FROM_CENTROID_IN_METERS

establishmentMeans = ESTABLISHMENT_MEANS

eventId = EVENT_ID

familyKey = FAMILY_KEY

format = FORMAT

fromDate = FROM_DATE

genusKey = GENUS_KEY

geoDistance = GEO_DISTANCE

identifiedBy = IDENTIFIED_BY

identifiedByID = IDENTIFIED_BY_ID

kingdomKey = KINGDOM_KEY

license = LICENSE

locality = LOCALITY

modified = MODIFIED

networkKey = NETWORK_KEY

occurrenceId = OCCURRENCE_ID

occurrenceStatus = OCCURRENCE_STATUS

orderKey = ORDER_KEY

organismId = ORGANISM_ID

organismQuantity = ORGANISM_QUANTITY

organismQuantityType = ORGANISM_QUANTITY_TYPE

otherCatalogNumbers = OTHER_CATALOG_NUMBERS

phylumKey = PHYLUM_KEY

preparations = PREPARATIONS

programme = PROGRAMME

projectId = PROJECT_ID

protocol = PROTOCOL

publishingCountry = PUBLISHING_COUNTRY

publishingOrg = PUBLISHING_ORG

publishingOrgKey = PUBLISHING_ORG_KEY

recordedByID = RECORDED_BY_ID

recordNumber = RECORD_NUMBER

relativeOrganismQuantity = RELATIVE_ORGANISM_QUANTITY

sampleSizeUnit = SAMPLE_SIZE_UNIT

sampleSizeValue = SAMPLE_SIZE_VALUE

samplingProtocol = SAMPLING_PROTOCOL

speciesKey = SPECIES_KEY

stateProvince = STATE_PROVINCE

subgenusKey = SUBGENUS_KEY

taxonId = TAXON_ID

toDate = TO_DATE

userCountry = USER_COUNTRY

verbatimScientificName = VERBATIM_SCIENTIFIC_NAME

waterBody = WATER_BODY

See the API docs http://www.gbif.org/developer/occurrence#download and the predicates docs http://www.gbif.org/developer/occurrence#predicates for more info.

GBIF has a limit of 100,000 predicates and 10,000 points (in within predicates) for download queries – so if your download request is particularly complex, you may need to split it into multiple requests by one factor or another.

Returns:: A dictionary, of results

Usage:

from pygbif import occurrences as occ

occ.download('basisOfRecord = PRESERVED_SPECIMEN')
occ.download('taxonKey = 3119195')
occ.download('decimalLatitude > 50')
occ.download('elevation >= 9000')
occ.download('decimalLatitude >= 65')
occ.download('country = US')
occ.download('institutionCode = TLMF')
occ.download('catalogNumber = Bird.27847588')

res = occ.download(['taxonKey = 7264332', 'hasCoordinate = TRUE'])

# pass output to download_meta for more information
occ.download_meta(occ.download('decimalLatitude > 75'))

# multiple queries
gg = occ.download(['decimalLatitude >= 65',
                  'decimalLatitude <= -65'], pred_type ='or')
gg = occ.download(['depth = 80', 'taxonKey = 2343454'],
                  pred_type ='or')

# repratriated data for Costa Rica
occ.download(['country = CR', 'repatriated = true'])

# turn off logging
import logging
logger = logging.getLogger()
logger.disabled = True
z = occ.download('elevation >= 95000')
logger.disabled = False
w = occ.download('elevation >= 10000')

# nested and complex queries with multiple predicates
## For more complex queries, it may be advantagous to format the query in JSON format. It must follow the predicate format described in the API documentation (https://www.gbif.org/developer/occurrence#download):
query = { "type": "and",
  "predicates": [
    {  "type": "in",
        "key": "TAXON_KEY",
        "values": ["2387246","2399391","2364604"]},
    {   "type": "isNotNull",
        "parameter": "YEAR"},
    {  "type": "not",
       "predicate": {  "type": "in",
                                "key": "ISSUE",
                                "values": ["RECORDED_DATE_INVALID",
                                                 "TAXON_MATCH_FUZZY",
                                                 "TAXON_MATCH_HIGHERRANK"] }} ]}
occ.download(query)

# The same query can also be applied in the occ.download function (including download format specified):
occ.download(['taxonKey in ["2387246", "2399391","2364604"]', 'year !Null', "issue !in ['RECORDED_DATE_INVALID', 'TAXON_MATCH_FUZZY', 'TAXON_MATCH_HIGHERRANK']"], "DWCA")

occurrences.download_meta(**kwargs)

Retrieves the occurrence download metadata by its unique key. Further named arguments passed on to requests.get can be included as additional arguments

Parameters:: key – [str] A key generated from a request, like that from download

Usage:

from pygbif import occurrences as occ
occ.download_meta(key = "0003970-140910143529206")
occ.download_meta(key = "0000099-140929101555934")

occurrences.download_list(pwd=None, limit=20, offset=0)

Lists the downloads created by a user.

Parameters:

user – [str] A user name, look at env var GBIF_USER first
pwd – [str] Your password, look at env var GBIF_PWD first
limit – [int] Number of records to return. Default: 20
offset – [int] Record number to start at. Default: 0

Usage:

from pygbif import occurrences as occ
occ.download_list(user = "sckott")
occ.download_list(user = "sckott", limit = 5)
occ.download_list(user = "sckott", offset = 21)

occurrences.download_get(path='.', **kwargs)

Get a download from GBIF.

Parameters:

key – [str] A key generated from a request, like that from download
path – [str] Path to write zip file to. Default: ".", with a .zip appended to the end.
kwargs – Further named arguments passed on to requests.get

Downloads the zip file to a directory you specify on your machine. The speed of this function is of course proportional to the size of the file to download, and affected by your internet connection speed.

This function only downloads the file. To open and read it, see https://github.com/BelgianBiodiversityPlatform/python-dwca-reader

Usage:

from pygbif import occurrences as occ
x=occ.download_get("0000066-140928181241064")
occ.download_get("0003983-140910143529206")

# turn off logging
import logging
logger = logging.getLogger()
logger.disabled = True
x = occ.download_get("0000066-140928181241064")

# turn back on
logger.disabled = False
x = occ.download_get("0000066-140928181241064")

occurrences.download_sql(format='SQL_TSV_ZIP', user=None, pwd=None, email=None)

Download data using a SQL query.

This is an experimental feature, and the implementation may change throughout 2024. The feature is currently only available for preview by invited users. Contact helpdesk@gbif.org to request access.

Parameters:

sql – [str] A SQL query
format – [str] The format to download the data in. Only SQL_TSV_ZIP is currently supported.
user – [str] A user name, will look at env var GBIF_USER first.
pwd – [str] Your password, will look at env var GBIF_PWD first.
email – [str] Your email, will look at env var GBIF_EMAIL first.

Returns:

A string, the request id

Usage:

from pygbif import occurrences as occ

occ.download_sql("SELECT gbifid,publishingCountry FROM occurrence WHERE publishingCountry=GB'")

occurrences.download_describe(**kwargs)

Get a description the download format. This is useful for understanding what fields are available in a given download format without having to run a download.

Parameters:

format – [str] A format to describe. One of “simpleCsv”, “simpleParquet”, “dwca”, “speciesList”, “simpleAvro”, “sql”
kwargs – Further named arguments passed on to requests.get

Returns:

A dictionary, of results

Usage:

from pygbif import occurrences as occ

occ.download_describe("dwca")
occ.download_describe("simpleCsv")
occ.download_describe("simpleParquet")
occ.download_describe("speciesList")
occ.download_describe("simpleAvro")
occ.download_describe("sql")

occurrences.download_citation()

Get citation from a download key

Parameters:: key – [int] A GBIF download key
Returns:: A dictionary, of results

Usage:

from pygbif import occurrences
occurrences.download_citation("0235283-220831081235567")