Foursight-core API Documentation

abstract_connection

class foursight_core.abstract_connection.AbstractConnection

AbstractConnection is an ‘abstract’ representation of the methods a connection subclass should implement. There will be others that are specific to the type of connection but this collection should be consistent across all connection types. For all intents and purposes this is an interface.

delete_keys(key_list): Deletes the given keys in key_list from this connection

get_all_objects(): Returns an array of the data values stored on this connection.

get_object(key): Generic get operation. Key is the filename, returns the data object that is stored on this connection.

get_size(): Returns the number of items stored on this connection

get_size_bytes(): Returns number of bytes stored on this connection

list_all_keys(): Lists all the keys stored on this connection.

list_all_keys_w_prefix(prefix): Given a prefix, return all keys that have that prefix.

put_object(key, value): Generic put operation. Key is typically the filename, value is the actual data to be stored.

test_connection(): Tests that this connection is reachable

app_utils

buckets

class foursight_core.buckets.Buckets: create and configure buckets for foursight

check_schema

check_utils

class foursight_core.check_utils.CheckHandler(foursight_prefix, check_package_name='foursight_core', check_setup_dir='/home/docs/checkouts/readthedocs.org/user_builds/foursight-core/checkouts/latest/foursight_core')

Class CheckHandler is a collection of utils related to checks

classmethod check_method_deco(method, decorator): See if the given method has the given decorator. Returns True if so, False if not.

get_action_strings(specific_action=None): Basically the same thing as get_check_strings, but for actions…

get_check_results(connection, checks=None, use_latest=False): Initialize check results for each desired check and get results stored in s3, sorted by status and then alphabetically by title. May provide a list of string check names as checks; otherwise get all checks by default. By default, gets the ‘primary’ results. If use_latest is True, get the ‘latest’ results instead.

get_check_schedule(schedule_name, conditions=None)

Go through CHECK_SETUP and return all the required info for to run a given schedule for any environment.

If a list of conditions is provided, filter the schedule to only include checks that match ALL of the conditions.

Returns a dictionary keyed by environ. The check running info is the standard format of: [<check_mod/check_str>, <kwargs>, <dependencies>]

get_check_strings(specific_check=None)

Return a list of all formatted check strings (<module>/<check_name>) in system. By default runs on all checks (specific_check == None), but can be used to get the check string of a certain check name as well.

IMPORTANT: any checks in test_checks module are excluded.

get_check_title_from_setup(check_name): Return a title of a check from CHECK_SETUP If not found, just return check_name

get_checks_within_schedule(schedule_name): Simply return a list of string check names within the given schedule

get_grouped_check_results(connection): Return a group-centric view of the information from get_check_results for given connection (i.e. fs environment). Returns a list of dicts dict that contains dicts of check results keyed by title and also counts of result statuses and group name. All groups are returned

classmethod get_methods_by_deco(mod, decorator): Returns all methods in module with decorator as a list; the decorator is set in check_function()

get_schedule_names(): Simply return a list of all valid schedule names, as defined in CHECK_SETUP

init_check_or_action_res(connection, check): Use in cases where a string is provided that could be a check or an action Returns None if neither are valid. Tries checks first then actions. If successful, returns a CheckResult or ActionResult

run_check_or_action(connection, check_str, check_kwargs)

Does validation of provided check_str, it’s module, and kwargs. Determines by decorator whether the method is a check or action, then runs it. All errors are taken care of within the running of the check/action.

Takes a FS_connection object, a check string formatted as: <str check module/name> and a dictionary of check arguments. For example: check_str: ‘system_checks/my_check’ check_kwargs: ‘{“foo”:123}’ Fetches the check function and runs it (returning whatever it returns) Return a string for failed results, CheckResult/ActionResult object otherwise.

validate_check_setup(check_setup)

Go through the check_setup json that was read in and make sure everything is properly formatted. Since scheduled kwargs and dependencies are optional, add those in at this point.

Also takes care of ensuring that multiple checks were not written with the same name and adds check module information to the check setup. Accordingly, verifies that each check in the check_setup is a real check.

decorators

deploy

Generate gitignored .chalice/config.json for deploy and then run deploy. Takes on parameter for now: stage (either “dev” or “prod”)

environment

es_connection

class foursight_core.es_connection.ESConnection(index=None, doc_type='result', host=None)

ESConnection is a handle to a remote ElasticSearch instance on AWS. All Foursight connections make use of the same ES instance but have separate indices for each one, such as ‘foursight-dev-cgap’, ‘foursight-dev-data’ etc

ESConnection is intended to work with only a single index.

Implements the AbstractConnection ‘interface’

create_index(name): Creates an ES index called name. Returns true in success

delete_index(name): Deletes the given index name from this es

delete_keys(key_list): Deletes all uuids in key_list from es. If key_list is large this will be a slow operation, but probably still not as slow as s3

get_all_objects(): Calls list_all_keys with full=True to get all the objects Only gets ES_SEARCH_SIZE number of results, most recent first.

get_main_page_checks(checks=None, primary=True): Gets all checks for the main page. If primary is true then all checks will be primary, otherwise we use latest. Only gets ES_SEARCH_SIZE number of results, most recent first.

get_object(key): Gets object with uuid=key from es. Returns None if not found or no index has been specified.

get_result_history(prefix, start, limit): ES handle to implement the get_result_history functionality of RunResult

get_size(): Returns the number of items indexed on this es instance. Returns -1 in failure.

get_size_bytes(): Returns number of bytes stored on this es instance

index_exists(name): Checks if the given index name exists

list_all_keys(): Generic search on es that will return all ids of indexed items Only gets ES_SEARCH_SIZE number of results, most recent first.

list_all_keys_w_prefix(prefix): Lists all id’s in this ES that have the given prefix. Only gets ES_SEARCH_SIZE number of results, most recent first.

classmethod load_json(rel, fname): Loads json file fname from rel/fname

load_mapping(fname='mapping.json'): Loads ES mapping from ‘mapping.json’ or another relative path from this file location.

put_object(key, value): Index a new item into es. Returns true in success

refresh_index(): Refreshes the index, then waits 3 seconds

search(search, key='_source'): Inner function that passes doc as a search parameter to ES. Based on the execute_search method in Fourfront

test_connection(): Hits health route on es to verify that it is up

exception foursight_core.es_connection.ElasticsearchException(message=None): Generic exception for an elasticsearch failure

exceptions

exception foursight_core.exceptions.BadCheckOrAction(message=None): Generic exception for a badly written check or library. __init__ takes some string error message

exception foursight_core.exceptions.BadCheckSetup(message=None): Generic exception for an issue with a check setup. __init__ takes some string error message

exception foursight_core.exceptions.MissingFoursightPrefixException(message=None): Generic exception for an issue with foursight prefix not defined or initialized before using a method that requires it. __init__ takes some string error message

fs_connection

class foursight_core.fs_connection.FSConnection(fs_environ, fs_environ_info, test=False, use_es=True, host=None)

Contains the foursight (FS) and fourfront (FF) connections needed to communicate with both services. Contains fields that link to the FF keys, and s3 connection, as well as the FS s3_connection. They are: - fs_env: string FS environment (such as ‘data’ or ‘webdev’) - ff_server: string server name of the linked FF - ff_env: string EB enviroment name of FF (such as ‘fourfront-webprod’).

This is kept up-to-date for data and staging (COMPATIBILITY NOTE: This argument is mis-named.

It isn’t really an ff_env but rather an s3 bucket key, so ‘fourfront-webprod’ still names the bucket used by environment ‘fourfront-blue’ and ‘fourfront-green’.)

ff_s3: s3Utils connection to the FF environment (see dcicutils.s3_utils)
ff_keys: FF keys for the environment with ‘key’, ‘secret’ and ‘server’
ff_es: string server of the elasticsearch for the FF
s3_connection: S3Connection object that is the s3 connection for FS

If param test=True, then do not actually attempt to initate the FF connections

get_object(key): Queries ES for key - checks S3 if it doesn’t find it

put_object(key, value): Puts an object onto both ES and S3

run_result

class foursight_core.run_result.ActionResult(connections, name)

Inherits from RunResult and is meant to be used with actions

get_associated_check_result(kwargs): Leverage required ‘check_name’ and ‘called_by’ kwargs to return the check result from the associted check of this action. This will throw a KeyError if the kwargs are missing, but that’s okay, since we want to enforce the new associated check/action model. Must pass in the dict kwargs

validate(): Validates this action result against the elasticsearch mapping Multiple errors are possible - the ‘latest’ wrt when it is checked below is reported

class foursight_core.run_result.CheckResult(connections, name, init_uuid=None)

Inherits from RunResult and is meant to be used with checks.

Usage: check = CheckResult(connection, <name>) check.status = … check.descritpion = … check.store_result()

validate(): Validates this CheckResult against the elasticsearch mapping Multiple errors are possible - the ‘latest’ wrt when it is checked below is reported

class foursight_core.run_result.RunResult(connections, name)

Generic class for CheckResult and ActionResult. Contains methods common to both.

delete_results(prior_date=None, primary=True, custom_filter=None, timeout=None, es_only=False): Goes through all check files deleting by default all non-primary checks. If a prior_date (datetime) is given then all results prior to the given time will be delete (including primaries). If primary is False then primary results will be cleaned as well. If a custom filter is given, that filter will be applied as well, prior to the above filters. Note that this argument can be a function or a lambda If es_only is specified, only delete the check from ES Returns a pair of the number of results deleted from s3 and es respectively

filename_to_datetime(key): Utility function. Key might look like sync_google_analytics_data/2018-10-15T19:08:32.734656.json We presume that timezone info is not important to allow us to use strptime.

get_all_results(): Return all results for this check. Should use with care

get_closest_result(diff_hours=0, diff_mins=0, override_date=None)

Returns check result that is closest to the current time minus diff_hours and diff_mins (both integers).

If override_date is provided, ignore other arguments and use the given date as the metric for finding the check. This MUST be a datetime obj.

TODO: Add some way to control which results are returned by kwargs? For example, you might only want primary results.

get_es_object(key): Grabs an object from ES. Returns none if not present, json otherwise

get_latest_result(): Returns the latest result (the last check run)

get_object(key): Gets an object given a key from the data store.

get_primary_result(): Returns the most recent primary result run (with ‘primary’=True in kwargs)

get_result_by_uuid(uuid): Returns result if it can be found by its uuid, otherwise None.

get_result_history(start, limit, after_date=None): Used to get the uuid, status, and kwargs for a specific check. Results are ordered by uuid (timestamped) and sliced from start to limit. Probably only called from app_utils.get_foursight_history. after_date is an optional datetime object, if provided only the history results after that point will be returned. Returns a list of lists (inner lists: [status, kwargs])

get_s3_object(key): Returns None if not present, otherwise returns a JSON parsed res.

list_keys(records_only=True, prefix=None): Lists all keys. If given a prefix only keys with that prefix will be returned.

put_object(key, value): Puts an object into the data stores

record_run_info(): Add a record of the completed check to the runs bucket with name equal to the dependency id. The object itself is only the status of the run. Returns True on success, False otherwise

store_formatted_result(uuid, formatted, primary)

Store the result in s3/ES. Always makes an entry with key equal to the uuid timestamp. Will also store under (i.e. overwrite)the ‘latest’ key. If is_primary, will also overwrite the ‘primary’ key.

NOTE: id_alias is an alias of the _id field, which is not searchable in ES >5, which breaks the main page.

abstract validate(): Validation method that must be implemented by the subclass

foursight_core.run_result.get_closest(items, pivot): Return the item in the list of items closest to the given pivot. Items should be given in tuple form (ID, value (to compare)) Intended primarily for use with datetime objects. See: S.O. 32237862

s3_connection

class foursight_core.s3_connection.S3Connection(bucket_name)

delete_keys(key_list): Deletes the given keys in key_list from this connection

get_all_objects(): Returns an array of the data values stored on this connection.

get_object(key): Generic get operation. Key is the filename, returns the data object that is stored on this connection.

get_size(): Gets the number of keys stored on this s3 connection. This is a very slow operation since it has to enumerate all keys.

get_size_bytes(): Uses CloudWatch client to get the bucket size in bytes of this bucket. Start and EndTime represent the window on which the bucket size will be calculated. An average is taken across the entire window (Period=86400) Useful for checks - may need further configuration

list_all_keys(): Lists all the keys stored on this connection.

list_all_keys_w_prefix(prefix, records_only=False)

List all s3 keys with the given prefix (should look like ‘<prefix>/’). If records_only == True, then add ‘20’ to the end of the prefix to only find records that are in timestamp form (will exclude ‘latest’ and ‘primary’.) s3 only returns up to 1000 results at once, hence the need for the for loop. NextContinuationToken shows if there are more results to return.

Returns the list of keys.

Also see list_all_keys()

put_object(key, value): Generic put operation. Key is typically the filename, value is the actual data to be stored.

test_connection(): Tests that this connection is reachable

sqs_utils

class foursight_core.sqs_utils.SQS(foursight_prefix)

class SQS is a collection of utils related to Foursight queues

delete_message_and_propogate(runner_input, receipt, propogate=True)

Delete the message with given receipt from sqs queue and invoke the next lambda runner.

Args:: runner_input (dict): runner info, should minimally have ‘sqs_url’ receipt (str): SQS message receipt propogate (bool): if True (default), invoke another check runner lambda
Returns:: None

classmethod get_sqs_attributes(sqs_url): Returns a dict of the desired attributes form the queue with given url

get_sqs_queue(): Returns boto3 sqs resource

invoke_check_runner(runner_input): Simple function to invoke the next check_runner lambda with runner_input (dict containing {‘sqs_url’: <str>})

recover_message_and_propogate(runner_input, receipt, propogate=True)

Recover the message with given receipt to sqs queue and invoke the next lambda runner.

Changing message VisibilityTimeout to 15 seconds means the message will be available to the queue in that much time. This is a slight lag to allow dependencies to process. NOTE: VisibilityTimeout should be less than WaitTimeSeconds in run_check_runner

Args:: runner_input (dict): runner info, should minimally have ‘sqs_url’ receipt (str): SQS message receipt propogate (bool): if True (default), invoke another check runner lambda
Returns:: None

classmethod send_sqs_messages(queue, environ, check_vals, uuid=None)

Send messages to SQS queue. Check_vals are entries within a check_group. Optionally, provide a uuid that will be queued as the uuid for the run; if not provided, datetime.utcnow is used

Args:

queue: boto3 sqs resource (from get_sqs_queue) environ (str): foursight environment name check_vals (list): list of formatted check vals, like those from

check_utils.CheckHandler().get_check_schedule

uuid (str): optional string uuid

Returns:

str: uuid of queued messages