species module

species module API:

  • name_backbone
  • name_suggest
  • name_usage
  • name_lookup
  • name_parser

Example usage:

from pygbif import species
species.name_suggest(q='Puma concolor')

species API

species.name_backbone(name, rank=None, kingdom=None, phylum=None, clazz=None, order=None, family=None, genus=None, strict=False, verbose=False, offset=None, limit=100, **kwargs)

Lookup names in the GBIF backbone taxonomy.

Parameters:
  • name – [str] Full scientific name potentially with authorship (required)
  • rank – [str] The rank given as our rank enum. (optional)
  • kingdom – [str] If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)
  • phylum – [str] If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)
  • class – [str] If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)
  • order – [str] If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)
  • family – [str] If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)
  • genus – [str] If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)
  • strict – [bool] If True it (fuzzy) matches only the given name, but never a taxon in the upper classification (optional)
  • verbose – [bool] If True show alternative matches considered which had been rejected.
  • offset – [int] Record to start at. Default: 0
  • limit – [int] Number of results to return. Default: 100

A list for a single taxon with many slots (with verbose=False - default), or a list of length two, first element for the suggested taxon match, and a data.frame with alternative name suggestions resulting from fuzzy matching (with verbose=True).

If you don’t get a match GBIF gives back a list of length 3 with slots synonym, confidence, and matchType='NONE'.

reference: http://www.gbif.org/developer/species#searching

Usage:

from pygbif import species
species.name_backbone(name='Helianthus annuus', kingdom='plants')
species.name_backbone(name='Helianthus', rank='genus', kingdom='plants')
species.name_backbone(name='Poa', rank='genus', family='Poaceae')

# Verbose - gives back alternatives
species.name_backbone(name='Helianthus annuus', kingdom='plants', verbose=True)

# Strictness
species.name_backbone(name='Poa', kingdom='plants', verbose=True, strict=False)
species.name_backbone(name='Helianthus annuus', kingdom='plants', verbose=True, strict=True)

# Non-existent name
species.name_backbone(name='Aso')

# Multiple equal matches
species.name_backbone(name='Oenante')
species.name_suggest(q=None, datasetKey=None, rank=None, limit=100, offset=None, **kwargs)

A quick and simple autocomplete service that returns up to 20 name usages by doing prefix matching against the scientific name. Results are ordered by relevance.

Parameters:
  • q – [str] Simple search parameter. The value for this parameter can be a simple word or a phrase. Wildcards can be added to the simple word parameters only, e.g. q=*puma* (Required)
  • datasetKey – [str] Filters by the checklist dataset key (a uuid, see examples)
  • rank – [str] A taxonomic rank. One of class, cultivar, cultivar_group, domain, family, form, genus, informal, infrageneric_name, infraorder, infraspecific_name, infrasubspecific_name, kingdom, order, phylum, section, series, species, strain, subclass, subfamily, subform, subgenus, subkingdom, suborder, subphylum, subsection, subseries, subspecies, subtribe, subvariety, superclass, superfamily, superorder, superphylum, suprageneric_name, tribe, unranked, or variety.
  • limit – [fixnum] Number of records to return. Maximum: 1000. (optional)
  • offset – [fixnum] Record number to start at. (optional)
Returns:

A dictionary

References: http://www.gbif.org/developer/species#searching

Usage:

from pygbif import species

species.name_suggest(q='Puma concolor')
x = species.name_suggest(q='Puma')
species.name_suggest(q='Puma', rank="genus")
species.name_suggest(q='Puma', rank="subspecies")
species.name_suggest(q='Puma', rank="species")
species.name_suggest(q='Puma', rank="infraspecific_name")
species.name_suggest(q='Puma', limit=2)
species.name_lookup(q=None, rank=None, higherTaxonKey=None, status=None, isExtinct=None, habitat=None, nameType=None, datasetKey=None, nomenclaturalStatus=None, limit=100, offset=None, facet=False, facetMincount=None, facetMultiselect=None, type=None, hl=False, verbose=False, **kwargs)

Lookup names in all taxonomies in GBIF.

This service uses fuzzy lookup so that you can put in partial names and you should get back those things that match. See examples below.

Parameters:
  • q – [str] Query term(s) for full text search (optional)
  • rank – [str] CLASS, CULTIVAR, CULTIVAR_GROUP, DOMAIN, FAMILY, FORM, GENUS, INFORMAL, INFRAGENERIC_NAME, INFRAORDER, INFRASPECIFIC_NAME, INFRASUBSPECIFIC_NAME, KINGDOM, ORDER, PHYLUM, SECTION, SERIES, SPECIES, STRAIN, SUBCLASS, SUBFAMILY, SUBFORM, SUBGENUS, SUBKINGDOM, SUBORDER, SUBPHYLUM, SUBSECTION, SUBSERIES, SUBSPECIES, SUBTRIBE, SUBVARIETY, SUPERCLASS, SUPERFAMILY, SUPERORDER, SUPERPHYLUM, SUPRAGENERIC_NAME, TRIBE, UNRANKED, VARIETY (optional)
  • verbose – [bool] If True show alternative matches considered which had been rejected.
  • higherTaxonKey – [str] Filters by any of the higher Linnean rank keys. Note this is within the respective checklist and not searching nub keys across all checklists (optional)
  • status

    [str] (optional) Filters by the taxonomic status as one of:

    • ACCEPTED
    • DETERMINATION_SYNONYM Used for unknown child taxa referred to via spec, ssp, ...
    • DOUBTFUL Treated as accepted, but doubtful whether this is correct.
    • HETEROTYPIC_SYNONYM More specific subclass of SYNONYM.
    • HOMOTYPIC_SYNONYM More specific subclass of SYNONYM.
    • INTERMEDIATE_RANK_SYNONYM Used in nub only.
    • MISAPPLIED More specific subclass of SYNONYM.
    • PROPARTE_SYNONYM More specific subclass of SYNONYM.
    • SYNONYM A general synonym, the exact type is unknown.
  • isExtinct – [bool] Filters by extinction status (e.g. isExtinct=True)
  • habitat – [str] Filters by habitat. One of: marine, freshwater, or terrestrial (optional)
  • nameType

    [str] (optional) Filters by the name type as one of:

    • BLACKLISTED surely not a scientific name.
    • CANDIDATUS Candidatus is a component of the taxonomic name for a bacterium that cannot be maintained in a Bacteriology Culture Collection.
    • CULTIVAR a cultivated plant name.
    • DOUBTFUL doubtful whether this is a scientific name at all.
    • HYBRID a hybrid formula (not a hybrid name).
    • INFORMAL a scientific name with some informal addition like “cf.” or indetermined like Abies spec.
    • SCINAME a scientific name which is not well formed.
    • VIRUS a virus name.
    • WELLFORMED a well formed scientific name according to present nomenclatural rules.
  • datasetKey – [str] Filters by the dataset’s key (a uuid) (optional)
  • nomenclaturalStatus – [str] Not yet implemented, but will eventually allow for filtering by a nomenclatural status enum
  • limit – [fixnum] Number of records to return. Maximum: 1000. (optional)
  • offset – [fixnum] Record number to start at. (optional)
  • facet – [str] A list of facet names used to retrieve the 100 most frequent values for a field. Allowed facets are: datasetKey, higherTaxonKey, rank, status, isExtinct, habitat, and nameType. Additionally threat and nomenclaturalStatus are legal values but not yet implemented, so data will not yet be returned for them. (optional)
  • facetMincount – [str] Used in combination with the facet parameter. Set facetMincount={#} to exclude facets with a count less than {#}, e.g. http://bit.ly/1bMdByP only shows the type value ACCEPTED because the other statuses have counts less than 7,000,000 (optional)
  • facetMultiselect – [bool] Used in combination with the facet parameter. Set facetMultiselect=True to still return counts for values that are not currently filtered, e.g. http://bit.ly/19YLXPO still shows all status values even though status is being filtered by status=ACCEPTED (optional)
  • type – [str] Type of name. One of occurrence, checklist, or metadata. (optional)
  • hl – [bool] Set hl=True to highlight terms matching the query when in fulltext search fields. The highlight will be an emphasis tag of class gbifH1 e.g. q='plant', hl=True. Fulltext search fields include: title, keyword, country, publishing country, publishing organization title, hosting organization title, and description. One additional full text field is searched which includes information from metadata documents, but the text of this field is not returned in the response. (optional)
Returns:

A dictionary

References:

http://www.gbif.org/developer/species#searching

Usage:

from pygbif import species

# Look up names like mammalia
species.name_lookup(q='mammalia')

# Paging
species.name_lookup(q='mammalia', limit=1)
species.name_lookup(q='mammalia', limit=1, offset=2)

# large requests, use offset parameter
first = species.name_lookup(q='mammalia', limit=1000)
second = species.name_lookup(q='mammalia', limit=1000, offset=1000)

# Get all data and parse it, removing descriptions which can be quite long
species.name_lookup('Helianthus annuus', rank="species", verbose=True)

# Get all data and parse it, removing descriptions field which can be quite long
out = species.name_lookup('Helianthus annuus', rank="species")
res = out['results']
[ z.pop('descriptions', None) for z in res ]
res

# Fuzzy searching
species.name_lookup(q='Heli', rank="genus")

# Limit records to certain number
species.name_lookup('Helianthus annuus', rank="species", limit=2)

# Query by habitat
species.name_lookup(habitat = "terrestrial", limit=2)
species.name_lookup(habitat = "marine", limit=2)
species.name_lookup(habitat = "freshwater", limit=2)

# Using faceting
species.name_lookup(facet='status', limit=0, facetMincount='70000')
species.name_lookup(facet=['status', 'higherTaxonKey'], limit=0, facetMincount='700000')

species.name_lookup(facet='nameType', limit=0)
species.name_lookup(facet='habitat', limit=0)
species.name_lookup(facet='datasetKey', limit=0)
species.name_lookup(facet='rank', limit=0)
species.name_lookup(facet='isExtinct', limit=0)

# text highlighting
species.name_lookup(q='plant', hl=True, limit=30)

# Lookup by datasetKey
species.name_lookup(datasetKey='3f8a1297-3259-4700-91fc-acc4170b27ce')
species.name_usage(key=None, name=None, data='all', language=None, datasetKey=None, uuid=None, sourceId=None, rank=None, shortname=None, limit=100, offset=None, **kwargs)

Lookup details for specific names in all taxonomies in GBIF.

Parameters:
  • key – [fixnum] A GBIF key for a taxon
  • name – [str] Filters by a case insensitive, canonical namestring, e.g. ‘Puma concolor’
  • data – [str] The type of data to get. Default: all. Options: all, verbatim, name, parents, children, related, synonyms, descriptions, distributions, media, references, speciesProfiles, vernacularNames, typeSpecimens, root
  • language – [str] Language, default is english
  • datasetKey – [str] Filters by the dataset’s key (a uuid)
  • uuid – [str] A uuid for a dataset. Should give exact same results as datasetKey.
  • sourceId – [fixnum] Filters by the source identifier.
  • rank – [str] Taxonomic rank. Filters by taxonomic rank as one of: CLASS, CULTIVAR, CULTIVAR_GROUP, DOMAIN, FAMILY, FORM, GENUS, INFORMAL, INFRAGENERIC_NAME, INFRAORDER, INFRASPECIFIC_NAME, INFRASUBSPECIFIC_NAME, KINGDOM, ORDER, PHYLUM, SECTION, SERIES, SPECIES, STRAIN, SUBCLASS, SUBFAMILY, SUBFORM, SUBGENUS, SUBKINGDOM, SUBORDER, SUBPHYLUM, SUBSECTION, SUBSERIES, SUBSPECIES, SUBTRIBE, SUBVARIETY, SUPERCLASS, SUPERFAMILY, SUPERORDER, SUPERPHYLUM, SUPRAGENERIC_NAME, TRIBE, UNRANKED, VARIETY
  • shortname – [str] A short name..need more info on this?
  • limit – [fixnum] Number of records to return. Default: 100. Maximum: 1000. (optional)
  • offset – [fixnum] Record number to start at. (optional)

References: http://www.gbif.org/developer/species#nameUsages

Usage:

from pygbif import species

species.name_usage(key=1)

# Name usage for a taxonomic name
species.name_usage(name='Puma', rank="GENUS")

# All name usages
species.name_usage()

# References for a name usage
species.name_usage(key=2435099, data='references')

# Species profiles, descriptions
species.name_usage(key=3119195, data='speciesProfiles')
species.name_usage(key=3119195, data='descriptions')
species.name_usage(key=2435099, data='children')

# Vernacular names for a name usage
species.name_usage(key=3119195, data='vernacularNames')

# Limit number of results returned
species.name_usage(key=3119195, data='vernacularNames', limit=3)

# Search for names by dataset with datasetKey parameter
species.name_usage(datasetKey="d7dddbf4-2cf0-4f39-9b2a-bb099caae36c")

# Search for a particular language
species.name_usage(key=3119195, language="FRENCH", data='vernacularNames')
species.name_parser(name, **kwargs)

Parse taxon names using the GBIF name parser

Parameters:name – [str] A character vector of scientific names. (required)

reference: http://www.gbif.org/developer/species#parser

Usage:

from pygbif import species
species.name_parser('x Agropogon littoralis')
species.name_parser(['Arrhenatherum elatius var. elatius',
  'Secale cereale subsp. cereale', 'Secale cereale ssp. cereale',
  'Vanessa atalanta (Linnaeus, 1758)'])