Usage

After you have installed the package through pip install afscgap or similar, this documentation leverages short examples in a tutorial-like setting to show how to use the afscgap package.

Note that full formalized API documentation is available as generated by pdoc in CI / CD.



Visual analytics

Visualization tools are available to help both programmers and non-programmers start their investigation, providing a UI that stands on the other functionality provided by this project. This is available online at https://app.pyafscgap.org. It can generate both CSV (spreadsheet) exports and Python query code to move investigations to their next steps. To self-host or run this tool locally, see the visualization readme.


Basic queries

The afscgap.Query object is the main entry point into Python-based utilization. Calls can be written manually or generated in the visual analytics tool. For example, this requests all records of Pasiphaea pacifica in 2021 from the Gulf of Alaska to get the median bottom temperature when they were observed:

import statistics

import afscgap

# Build query
query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

# Get temperatures in Celsius
temperatures = [record.get_bottom_temperature(units='c') for record in results]

# Take the median
print(statistics.median(temperatures))

Note that afscgap.Query.execute returns a Cursor. One can iterate over this Cursor to access Record objects. You can do this with list comprehensions, maps, etc or with a good old for loop like in this example which gets a histogram of haul temperatures:

# Mapping from temperature in Celsius to count
count_by_temperature_c = {}

# Build query
query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

# Iterate through results and count
for record in results:
    temp = record.get_bottom_temperature(units='c')
    temp_rounded = round(temp)
    count = count_by_temperature_c.get(temp_rounded, 0) + 1
    count_by_temperature_c[temp_rounded] = count

# Print the result
print(count_by_temperature_c)

See data structure section. Using an iterator will have the library negotiate pagination behind the scenes so this operation will cause multiple HTTP requests while the iterator runs.


Enable absence data

One of the major limitations of the official API is that it only provides presence data. However, this library can optionally infer absence or "zero catch" records using a separate static file produced by NOAA AFSC GAP. The algorithm and details for absence inference is further discussed below.

Absence data / "zero catch" records inference can be turned on by passing False to set_presence_only in Query. To demonstrate, this example finds total area swept and total weight for Gadus macrocephalus from the Aleutian Islands in 2021:

import afscgap

query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Gadus macrocephalus')
query.set_presence_only(False)
results = query.execute()

total_area = 0
total_weight = 0

for record in results:
    total_area += record.get_area_swept(units='ha')
    total_weight += record.get_weight(units='kg')

template = '%.2f kg / hectare swept (%.1f kg, %.1f hectares'
weight_per_area = total_weight / total_area
message = template % (weight_per_area, total_weight, total_area)

print(message)

For more details on the zero catch record feature, please see below.


Chaining

It is possible to use the Query object for method chaining.

import statistics

import afscgap

# Build query
results = afscgap.Query() \
    .filter_year(eq=2021) \
    .filter_srvy(eq='GOA') \
    .filter_scientific_name(eq='Pasiphaea pacifica') \
    .execute()

# Get temperatures in Celsius
temperatures = [record.get_bottom_temperature(units='c') for record in results]

# Take the median
print(statistics.median(temperatures))

Each filter and set method on Query returns the same query object.


Builder operations

Note that Query is a builder. So, it may be used to execute a search and then execute another search with slightly modified parameters:

import statistics

import afscgap

# Build query
query = afscgap.Query()
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')

# Get temperatures in Celsius for 2021
query.filter_year(eq=2021)
results = query.execute()
temperatures = [record.get_bottom_temperature(units='c') for record in results]
print(statistics.median(temperatures))

# Get temperatures in Celsius for 2019
query.filter_year(eq=2019)
results = query.execute()
temperatures = [record.get_bottom_temperature(units='c') for record in results]
print(statistics.median(temperatures))

When calling filter, all prior filters on the query object for that field are overwritten.


Serialization

Users may request a dictionary representation:

import afscgap

# Create a query
query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

# Get dictionary from individual record
for record in results:
    dict_representation = record.to_dict()
    print(dict_representation['bottom_temperature_c'])

# Execute again
results = query.execute()

# Get dictionary for all records
results_dicts = results.to_dicts()

for record in results_dicts:
    print(record['bottom_temperature_c'])

Note to_dicts returns an iterator by default, but it can be realized as a full list using the list() command.


Pandas

The dictionary form of the data can be used to create a Pandas dataframe:

import pandas

import afscgap

query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

pandas.DataFrame(results.to_dicts())

Note that Pandas is not required to use this library.


Advanced filtering

You can provide range queries which translate to ORDS or Python emulated filters. For example, the following requests before and including 2019:

import afscgap

# Build query
query = afscgap.Query()
query.filter_year(max_val=2021)  # Note max_val
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

# Sum weight
weights = map(lambda x: x.get_weight(units='kg'), results)
total_weight = sum(weights)
print(total_weight)

The following requests data after and including 2019:

import afscgap

# Build query
query = afscgap.Query()
query.filter_year(min_val=2021)  # Note min_val
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

# Sum weight
weights = map(lambda x: x.get_weight(units='kg'), results)
total_weight = sum(weights)
print(total_weight)

Finally, the following requests data between 2015 and 2019 (includes 2015 and 2019):

import afscgap

# Build query
query = afscgap.Query()
query.filter_year(min_val=2015, max_val=2019)   # Note min/max_val
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Pasiphaea pacifica')
results = query.execute()

# Sum weight
weights = map(lambda x: x.get_weight(units='kg'), results)
total_weight = sum(weights)
print(total_weight)

For more advanced filters, please see manual filtering below.


Manual filtering

Users may provide advanced queries using Oracle's REST API query parameters. For example, this queries for 2021 records with haul from the Gulf of Alaska in a specific geographic area:

import afscgap

# Query with ORDS syntax
query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_latitude({'$between': [56, 57]})
query.filter_longitude({'$between': [-161, -160]})
results = query.execute()

# Summarize
count_by_common_name = {}

for record in results:
    common_name = record.get_common_name()
    new_count = record.get_count()
    count = count_by_common_name.get(common_name, 0) + new_count
    count_by_common_name[common_name] = count

# Print
print(count_by_common_name['walleye pollock'])

For more info about the options available, consider the Oracle docs or a helpful unaffiliated getting started tutorial from Jeff Smith.


Manual pagination

By default, the library will iterate through all results and handle pagination behind the scenes. However, one can also request an individual page:

import afscgap 

query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Gadus macrocephalus')
results = query.execute()

results_for_page = results.get_page(offset=20, limit=53)
print(len(results_for_page))

Client code can also change the pagination behavior used when iterating:

import afscgap

query = afscgap.Query()
query.filter_year(eq=2021)
query.filter_srvy(eq='GOA')
query.filter_scientific_name(eq='Gadus macrocephalus')
query.set_start_offset(10)
query.set_limit(200)
query.set_filter_incomplete(True)
results = query.execute()

for record in results:
    print(record.get_bottom_temperature(units='c'))

Note that records are only requested once during iteration and only after the prior page has been returned via the iterator ("lazy" loading).