Dependency Injection in Galaxy

πŸ“Š View as slides

Learning Questions

  • What is dependency injection?

  • Why does Galaxy use dependency injection?

  • How do I use DI in controllers and tasks?

Learning Objectives

  • Understand the problems with the app god object

  • Learn about type-based dependency injection

  • Use DI in controllers and tasks

  • Understand the benefits of typing

Big Interconnected App Python 2

A God object

β€œa God object is an object that knows too much or does too much. The God object is an example of an anti-pattern and a code smell.”

https://en.wikipedia.org/wiki/God_object

Not only does app know and do too much, it is also used way too many places. Every interesting component, every controller, the web transaction, etc. has a reference to app.

Big Interconnected App Python 3 - no right

Problematic Dependency Graph

When managers depend directly on UniverseApplication:

class DatasetCollectionManager:

     def __init__(self, app: UniverseApplication):
        self.type_registry = DATASET_COLLECTION_TYPES_REGISTRY
        self.collection_type_descriptions = COLLECTION_TYPE_DESCRIPTION_FACTORY
        self.model = app.model
        self.security = app.security

        self.hda_manager = hdas.HDAManager(app)
        self.history_manager = histories.HistoryManager(app)
        self.tag_handler = tags.GalaxyTagHandler(app.model.context)
        self.ldda_manager = lddas.LDDAManager(app)

UniverseApplication creates a DatasetCollectionManager for the application and DatasetCollectionManager imports and annotates the UniverseApplication as a requirement. This creates an unfortunate dependency loop.

Dependencies should form a DAG (directed acyclic graph).

Why an Interface?

By using StructuredApp interface instead of UniverseApplication:

class DatasetCollectionManager:

     def __init__(self, app: StructuredApp):
        self.type_registry = DATASET_COLLECTION_TYPES_REGISTRY
        self.collection_type_descriptions = COLLECTION_TYPE_DESCRIPTION_FACTORY
        self.model = app.model
        self.security = app.security

        self.hda_manager = hdas.HDAManager(app)
        self.history_manager = histories.HistoryManager(app)
        self.tag_handler = tags.GalaxyTagHandler(app.model.context)
        self.ldda_manager = lddas.LDDAManager(app)

Dependencies now closer to a DAG - DatasetCollectionManager no longer annotated with the type UniverseApplication! Imports are cleaner.

Big Interconnected App with Python 3 Types

Benefits of Typing

  • mypy provides robust type checking

  • IDE can provide hints to make developing this class and usage of this class easier

Design Problems with Handling Dependencies Directly

Using app to construct a manager for dealing with dataset collections.

  • DatasetCollectionManager needs to know how to construct all the other managers it is using, not just their interface

  • app has an instance of this class and app is used to construct an instance of this class - this circular dependency chain results in brittleness and complexity in how to construct app

  • app is very big and we’re depending on a lot of it but not a large percent of it. This makes typing less than ideal

Testing Problems with Handling Dependencies Directly

  • Difficult to unit test properly

    • What parts of app are being used?

    • How do we construct a smaller app with just those pieces?

    • How do we stub out classes cleanly when we’re creating the dependent objects internally?

Design Benefits of Injecting Dependencies

class DatasetCollectionManager:
    def __init__(
        self,
        model: GalaxyModelMapping,
        security: IdEncodingHelper,
        hda_manager: HDAManager,
        history_manager: HistoryManager,
        tag_handler: GalaxyTagHandler,
        ldda_manager: LDDAManager,
    ):
        self.type_registry = DATASET_COLLECTION_TYPES_REGISTRY
        self.collection_type_descriptions = COLLECTION_TYPE_DESCRIPTION_FACTORY
        self.model = model
        self.security = security

        self.hda_manager = hda_manager
        self.history_manager = history_manager
        self.tag_handler = tag_handler
        self.ldda_manager = ldda_manager
  • We’re no longer depending on app

  • The type signature very clearly delineates what dependencies are required

  • Unit testing can inject precise dependencies supplying only the behavior needed

Constructing the Object Is Still Brittle

DatasetCollectionManager(
    self.model,
    self.security,
    HDAManager(self),
    HistoryManager(self),
    GalaxyTagHandler(self.model.context),
    LDDAManager(self)
)
  • The complexity in ordering of construction of app is still challenging

  • The constructing code of this object still needs to know how to construct each dependency of the object

  • The constructing code of this object needs to explicitly import all the types

What is Type-based Dependency Injection?

A dependency injection container keeps tracks of singletons or recipes for how to construct each type. By default when it goes to construct an object, it can just ask the container for each dependency based on the type signature of the class being constructed.

If an object declares it consumes a dependency of type X (e.g. HDAManager), just query the container recursively for an object of type X.

Object Construction Simplification

Once all the dependencies have been type annotated properly and the needed singletons have been configured.

Before:

dcm = DatasetCollectionManager(
    self.model,
    self.security,
    HDAManager(self),
    HistoryManager(self),
    GalaxyTagHandler(self.model.context),
    LDDAManager(self)
)

After:

dcm = container[DatasetCollectionManager]

Picking a Library

Many of the existing DI libraries for Python predate widespread Python 3 and don’t readily infer things based on types. The benefits of typing and DI are both enhanced by the other - so it was important to pick one that could do type-based injection.

We went with Lagom, but we’ve built abstractions that would make it very easy to switch.

Lagom

Lagom Website

https://lagom-di.readthedocs.io/en/latest/

Tips for Designing New Galaxy Backend Components

  • Consume only the related components you need to avoid app when possible

  • Annotate inputs to the component with Python types

  • Use interface types to shield consumers from implementation details

  • Rely on Galaxy’s dependency injection to construct the component and provide it to consumers

DI in FastAPI Controllers

Old FastAPI Pattern

def get_tags_manager() -> TagsManager:
    return TagsManager()


@cbv(router)
class FastAPITags:
    manager: TagsManager = Depends(get_tags_manager)
    ...

Dependency injection allows for type checking but doesn’t use type inference (requires factory functions, etc.)

https://fastapi.tiangolo.com/tutorial/dependencies/

DI and Controllers - FastAPI Limitations

Also we have two different controller styles and only the new FastAPI allowed dependency injection.

def get_tags_manager() -> TagsManager:
    return TagsManager()


@cbv(router)
class FastAPITags:
    manager: TagsManager = Depends(get_tags_manager)
    ...

class TagsController(BaseAPIController):

    def __init__(self, app):
        super().__init__(app)
        self.manager = TagsManager()

DI and Controllers - Unified Approach

-def get_tags_manager() -> TagsManager:
-    return TagsManager()
-
-
 @cbv(router)
 class FastAPITags:
-    manager: TagsManager = Depends(get_tags_manager)
+    manager: TagsManager = depends(TagsManager)

     @router.put(
         '/api/tags',
@@ -58,11 +54,8 @@ def update(
      self.manager.update(trans, payload)


-class TagsController(BaseAPIController):
-
-    def __init__(self, app):
-        super().__init__(app)
-        self.manager = TagsManager()
+class TagsController(BaseGalaxyAPIController):
+    manager: TagsManager = depends(TagsManager)

Building dependency injection into our application and not relying on FastAPI allows for dependency injection that is less verbose, available uniformly across the application, works for the legacy controllers identically.

DI in Celery Tasks

Framework Setup

From lib/galaxy/celery/tasks.py:

from lagom import magic_bind_to_container
...

def galaxy_task(func):
    CELERY_TASKS.append(func.__name__)
    app = get_galaxy_app()
    if app:
        return magic_bind_to_container(app)(func)
    return func

magic_bind_to_container binds function parameters to a specified Lagom DI container automatically.

DI in Celery Tasks - Examples

Simple Task

@celery_app.task(ignore_result=True)
@galaxy_task
def purge_hda(hda_manager: HDAManager, hda_id):
    hda = hda_manager.by_id(hda_id)
    hda_manager._purge(hda)

Task with Multiple Dependencies

@celery_app.task
@galaxy_task
def set_metadata(
    hda_manager: HDAManager,
    ldda_manager: LDDAManager,
    dataset_id,
    model_class='HistoryDatasetAssociation'
):
    if model_class == 'HistoryDatasetAssociation':
        dataset = hda_manager.by_id(dataset_id)
    elif model_class == 'LibraryDatasetDatasetAssociation':
        dataset = ldda_manager.by_id(dataset_id)
    dataset.datatype.set_meta(dataset)

Dependencies are automatically injected based on type annotations!

Decomposed App

Key Takeaways

  • app was a god object that knew/did too much

  • Interfaces break circular dependencies

  • Type-based DI with Lagom simplifies construction

  • DI works uniformly across FastAPI, WSGI controllers, and tasks

  • Dependencies should form a DAG