Support external authenticaton source #129

Merged
christian-monch merged 14 commits from external-auth-source into master 2025-09-17 13:55:57 +00:00
14 changed files with 910 additions and 152 deletions

203
README.md
View file

@ -57,6 +57,7 @@ The following command line parameters are supported:
The parameter can be repeated to define secondary, tertiary, etc. sorting fields.
If a given field is not present in the record, the record will be sorted behind all records that possess the field.
### Configuration file
The service is configured via a configuration file that defines collections, paths for incoming and curated data for each collection, as well as token properties.
@ -131,7 +132,7 @@ tokens:
# A token and collection-specific label, that defines "zones" in which incoming
# records are stored. Multiple tokens can share the same zone, for example if
# many clients with individual tokens work together to build a collection.
# (Since this token does not allow right access, "incoming_label" is ignored and
# (Since this token does not allow write access, "incoming_label" is ignored and
# left empty here (TODO: it should not be required in this case)).
incoming_label: ''
@ -186,7 +187,7 @@ tokens:
The service currently supports the following backends for storing records:
- `record_dir`: this backend stores records as YAML-files in a directory structure that is defined [here](https://concepts.datalad.org/dump-things-storage-v0/). It reads the backend configuration from a "record collection configuration file" as described [here](https://concepts.datalad.org/dump-things-storage-v0/).
- `sqlite`: this backend stores records in a SQLite database. There is an individual database file, named `records.db`, for each curated area and incoming area.
- `sqlite`: this backend stores records in a SQLite database. There is an individual database file, named `.sqlite-records.db`, for each curated area and incoming area.
- `record_dir+stl`: here `stl` stands for "schema-type-layer".
This backend stores records in the same format as `record_dir`, but adds special treatment for the `schema_type` attribute in records.
@ -198,10 +199,9 @@ The service currently supports the following backends for storing records:
Backends can be defined per collection in the configuration file.
The backend will be used for the curated area and for the incoming areas of the collection.
If no backend is defined for a collection, the `record_dir+stl`-backend is used by default.
The `+stl`-backends can be useful to ensure that commands that return records of multiple classes in JSON format will always return records with a `schema_type` attribute.
This attribute allows clients to determine the class of each result record.
The `+stl`-backends can be useful if an endpoint returns records of multiple classes, because it allows clients to determine the class of each result record.
The service guarantees that backends of all types can co-exist independently in the same directory, i.e., there are no name collisions in files that are used for different backends (as long as no class name starts with `.`)).
The service guarantees that backends of all types can co-exist independently in the same directory, i.e., there are no name collisions in files that are used for different backends (as long as no class name starts with `.` or `_`)).
The following configuration snippet shows how to define a backend for a collection:
@ -209,12 +209,69 @@ The following configuration snippet shows how to define a backend for a collecti
...
collections:
collection_with_default_record_dir+stl_backend:
# This is a collection with the default backend, i.e. `record_dir+stl` and
# the default authentication, i.e. config-based authentication.
default_token: anon_read
curated: collection_1/curated
collection_with_forgejo_authentication_source:
# This is a collection with the default backend, i.e. `record_dir+stl` and
# a forgejo-based authentication source. That means it will use a forgejo
# instance to determine the permissions of a token for this collection.
# The instance is also used to determine the user-id and the incoming label.
# In the case of forgejo, the user-id and the incoming label are the
# forgejo login associated with the token.
# We still need the name of a default token. If the token is defined in this
# config file, its properties will be determined by the
# config file. If the token is not defined in the config file, its
# properties will be determined by the authentication sources. In this
# example by the forgejo-instance at `https://forgejo.example.com`.
# If there is more than one authentication source, they will be tried
# in the order they are defined in the config file.
default_token: anon_read # We still need a default token
curated: collection_2/curated
# Token permissions, user-ids (for record annotations), and incoming
# label can be determined by multiple authentication sources.
# If no source is defined, `config` will be used, which reads token
# information from the config file.
# This example explicitly defines `config` and a second authentication
# source, a `forgejo` authentication source.
auth_sources:
- type: forgejo # requires `user`-read and `organization`-read permissions on token
# The API-URL of the forgejo instance that should be used
url: https://forgejo.example.com/api/v1
# An organization
organization: data_handling
# A team in the organization. The authorization of the team
# determines the permissions of the token
team: data_entry_personal
# `label_type` determines how an incoming label is created for
# a Forgejo token. If `label_type` is `team`, the incoming label
# will be `forgejo-team-<organization>-<team>`. If `label_type`
# is `user`, the incoming label will be
# `forgejo-user-<user-login>`
label_type: team
# An optional repository. The token will only be authorized
# if the team has access to the repository. Note: if `repo`
# is set, the token must have at least repository read
# permissions.
repo: reference-repository
# Fallback to the config file.
- type: config # check tokens from the configuration file
# Multiple authorization sources are allowed. They will be tried in the
# order defined in the config file. If an authorization source returns
# permissions for a token, those permissions will be used and no other
# authorization sources will be queried.
# The default authorization source is `config`, which reads the token
# permissions, user-id, and incoming
collection_with_explicit_record_dir+stl_backend:
default_token: anon_read
curated: collection_1/curated
curated: collection_3/curated
backend:
# The record_dir-backend is identified by the
# type: "record_dir". No more attributes are
@ -223,7 +280,7 @@ collections:
collection_with_sqlite_backend:
default_token: anon_read
curated: collection_2/curated
curated: collection_4/curated
backend:
# The sqlite-backend is identified by the
# type: "sqlite". It requires a schema attribute
@ -233,6 +290,138 @@ collections:
schema: https://concepts.inm7.de/s/flat-data/unreleased.yaml
```
#### Authentication and authorization
To authenticate and authorize a user based on tokens, dumpthing-service uses
authentication sources. There are currently two authentication sources: the
configuration file and a Forgejo-based authentication source. Authentication
sources can be configured per collection. If no authentication source is
configured, the collection uses the configuration file.
If authentication sources are configured, they will be tried in order until
a token is authenticated. If an authentication source is listed twice, the
second instance will be ignored.
Authentication sources can be defined individually for each collection.
The collection-level key `auth_sources` should contain a list of authentication source configurations.
Authentication sources are tried in order until a token is successfully authenticated.
If no authentication source authenticates the token, the token will be rejected.
If no authentication source is defined, the configuration file will be used to authenticate tokens.
If an authentication source is defined multiple times, the first instance will be queried, all other instances will be ignored.
These authentication sources are available:
- config: use the configuration file to
- forgejo: use a Forgejo-instance to authenticate tokens
All authentication source configurations contain the key `type`.
Additional keys are authentication source type-specific.
The following configuration snippet contains an example for authentication
source configuration:
```yaml
collections:
collection_with_config_and_forgejo_auth_sources:
# Token permissions, user-ids (for record annotations), and incoming
# label can be determined by multiple authentication sources.
# If no source is defined, `config` will be used, which reads token
# information from the config file.
# This example explicitly defines `config` and a second authentication
# source, a `forgejo` authentication source.
auth_sources:
- type: forgejo # requires `user`-read and `organization`-read permissions on token
# The API-URL of the forgejo instance that should be used
url: https://forgejo.example.com/api/v1
# An organization
organization: data_handling
# A team in the organization. The authorization of the team
# determines the permissions of the token
team: data_entry_personal
# `label_type` determines how an incoming label is created for
# a Forgejo token. If `label_type` is `team`, the incoming label
# will be `forgejo-team-<organization>-<team>`. If `label_type`
# is `user`, the incoming label will be
# `forgejo-user-<user-login>`
label_type: team
# An optional repository. The token will only be authorized
# if the team has access to the repository. Note: if `repo`
# is set, the token must have at least repository read
# permissions.
repo: reference-repository
# Fallback to the config file.
- type: config # check tokens from the configuration file
# Multiple authorization sources are allowed. They will be tried in the
# order defined in `auth_sources`. If an authorization source returns
# permissions for a token, those permissions will be used and no other
# authorization sources will be queried.
# The default authorization source is `config`, which reads the token
# permissions, user-id, and incoming
...
```
##### Config-based authentication
```yaml
collections:
collection_with_config_authentication:
default_token: anon_read
curated: collection_5/curated
auth_sources:
- type: <must be 'config'> # check tokens from the configuration file
...
```
The configuration file will be used to authenticate tokens
##### Forgejo-based authentication
```yaml
collections:
collection_with_forgejo_authentication:
default_token: anon_read
curated: collection_5/curated
auth_sources:
- type: <must be 'forgejo'>
url: <Forgejo API-URL>
organization: <organization name>
team: <team_name>
label_type: <'team' or 'user'>
repository: <repository name> # Optional
...
```
The defined Forgejo-instance will be used to authenticate a token
The user ID is the email of the user.
If `label_type` is set to `team`, the incoming label is `forgejo-team-<organization-name>-<team-name>`.
If `label_type` is set to `user`, the incoming label is `forgejo-user-<user-login>`
The permissions will be fetched from the unit `repo.code` of the team definition.
The following mapping is used
| `repo.code` | curated_read | incoming_read | incoming_write |
| ------------- | ------------- | ------------- | ------------- |
| `none` | `False` | `False` | `False` |
| `read` | `True` | `True` | `False` |
| `write` | `True` | `True` | `True` |
A Forgejo authentication source can authenticate Forgejo-tokens that have at least the following `Read`-permissions:
- User: this is required to determine user-related information, i.e. user-email and user login name.
- Organization: this is required to determine the membership of a user to a team in an organization.
- Repository (only if `repository` is set in the configuration): required to determine a team's access to the repository.
### Command line parameters:
The service supports the following command line parameters:

View file

@ -6,6 +6,7 @@ from typing import (
from starlette.status import (
HTTP_200_OK,
HTTP_300_MULTIPLE_CHOICES,
HTTP_400_BAD_REQUEST,
HTTP_401_UNAUTHORIZED,
HTTP_403_FORBIDDEN,
@ -17,6 +18,7 @@ from starlette.status import (
__all__ = [
'Format',
'HTTP_200_OK',
'HTTP_300_MULTIPLE_CHOICES',
'HTTP_400_BAD_REQUEST',
'HTTP_401_UNAUTHORIZED',
'HTTP_403_FORBIDDEN',

View file

@ -0,0 +1,49 @@
"""Token-based authentication handlers
The authentication handlers are used to authenticate a token and to
determine:
- the permissions associated with the token
- the user id associated with the token
- the incoming_label to be used with the token
"""
from __future__ import annotations
import abc
import dataclasses
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from dump_things_service.token import TokenPermission
class AuthenticationError(Exception):
"""Exception for dumpthings authentication errors."""
class InvalidTokenError(AuthenticationError):
"""Exception for invalid token errors."""
@dataclasses.dataclass
class AuthenticationInfo:
token_permission: TokenPermission
user_id: str
incoming_label: str | None
class AuthenticationSource(metaclass=abc.ABCMeta):
@abc.abstractmethod
def authenticate(
self,
token: str,
) -> AuthenticationInfo:
"""
Authenticate a user based on the provided token and collection.
:param token: The authentication token.
:return: AuthenticationInfo
:raises AuthenticationError: If authentication fails.
"""
raise NotImplementedError

View file

@ -0,0 +1,38 @@
"""Use configuration information to fetch token permissions, ids, and incomng_label """
from dump_things_service.auth import (
AuthenticationInfo,
AuthenticationSource,
InvalidTokenError,
)
from dump_things_service.config import (
InstanceConfig,
)
missing = {}
class ConfigAuthenticationSource(AuthenticationSource):
def __init__(
self,
instance_config: InstanceConfig,
collection: str,
):
self.instance_config = instance_config
self.collection = collection
def authenticate(
self,
token: str,
) -> AuthenticationInfo:
token_info = self.instance_config.tokens.get(self.collection, {}).get(token, missing)
if token_info is missing:
msg = f'Token not valid for collection `{self.collection}`'
raise InvalidTokenError(msg)
return AuthenticationInfo(
token_permission=token_info['permissions'],
user_id=token_info['user_id'],
incoming_label=token_info['incoming_label'],
)

View file

@ -0,0 +1,235 @@
"""Use Forgejo instance to fetch token permissions, ids, and incomng_label
Note: for some reason, the request:
/api/v1/repos/{owner}/{repo}
does not require a token. If the owner and the repo are known, the request
will emit a complete repository-record including the complete owner-record.
"""
from __future__ import annotations
import time
from functools import wraps
from typing import Callable
import requests
from requests.exceptions import Timeout
from dump_things_service import (
HTTP_300_MULTIPLE_CHOICES,
HTTP_401_UNAUTHORIZED,
)
from dump_things_service.auth import (
AuthenticationError,
AuthenticationInfo,
AuthenticationSource,
InvalidTokenError,
)
from dump_things_service.config import TokenPermission
# Timeout for requests
_timeout = 10
_cached_data = {}
# Base class for classes that use method-level caching. The cache lives in
# the class instance and will be deleted when the instance is deleted.
class MethodCache:
def __init__(self):
self.__cached_data = {}
@staticmethod
def cache_temporary(
duration: int = 3600,
) -> Callable:
""" Cache results for a given time (default: 3600 seconds) """
def decorator(func: Callable) -> Callable:
@wraps(func)
def wrapper(*args, **kwargs):
self = args[0]
key = (func.__qualname__, *(args[1:]), *(kwargs.items()))
cached_data = self.__cached_data.get(key)
if cached_data is None or time.time() - cached_data[0] > duration:
self.__cached_data[key] = (time.time(), func(*args, **kwargs))
return self.__cached_data[key][1]
return wrapper
return decorator
class RemoteAuthenticationError(AuthenticationError):
"""Exception for remote authentication errors."""
def __init__(self, status: int, message: str):
self.status = status
self.message = message
super().__init__(f'Authentication failed with status {status}: {message}')
class ForgejoAuthenticationSource(AuthenticationSource, MethodCache):
def __init__(
self,
api_url: str,
organization: str,
team: str,
label_type: str,
repository: str | None = None,
):
"""
Create a Forgejo authentication source.
A token will be authorized if the associated user exists, is part of
team `team`, and if the repository is accessible by the team `team`.
The token permissions are taken from the unit mapping `repo.code` in the
team definition.
:param api_url: Forgejo API URL
:param organization: The name of the organization that defines the team
:param team: The name of the team
:param label_type: 'team' or 'user', determines how the incoming label
is created.
:param repository: Optional repository. If this is provided, access
will only be granted if the team has access to the repository.
"""
super().__init__()
self.api_url = api_url[:-1] if api_url[-1] == '/' else api_url
self.organization = organization
self.team = team
self.label_type = label_type
self.repository = repository
def _get_json_from_endpoint(
self,
endpoint: str,
token: str,
):
try:
r = requests.get(
url=f'{self.api_url}/{endpoint}',
headers={
'Accept': 'application/json',
'Authorization': f'token {token}',
},
timeout=_timeout,
)
except Timeout as e:
msg = f'timeout in request to {self.api_url}'
raise RemoteAuthenticationError(
status=HTTP_401_UNAUTHORIZED,
message=msg,
) from e
if r.status_code >= HTTP_300_MULTIPLE_CHOICES:
msg = f'invalid token: ({r.status_code}): {r.text}'
raise InvalidTokenError(msg)
return r.json()
def _get_user(
self,
token: str,
) -> dict:
return self._get_json_from_endpoint('user', token)
@MethodCache.cache_temporary()
def _get_organization(self, token: str) -> dict:
return self._get_json_from_endpoint(
f'orgs/{self.organization}',
token,
)
def _get_teams_for_user(self, token: str) -> dict:
r = self._get_json_from_endpoint('user/teams', token)
return {team['name']: team for team in r}
@MethodCache.cache_temporary()
def _get_teams_for_organization(
self,
token: str,
organization: str,
):
r = self._get_json_from_endpoint(
f'orgs/{organization}/teams',
token,
)
return {team['name']: team for team in r}
@MethodCache.cache_temporary()
def _get_teams_for_repo(
self,
token: str,
organization: str,
repository: str,
):
r = self._get_json_from_endpoint(
f'repos/{organization}/{repository}/teams',
token,
)
return {team['name']: team for team in r}
@staticmethod
def _get_permissions(code_permission: str) -> TokenPermission:
read = code_permission in ('read', 'write')
write = code_permission == 'write'
return TokenPermission(
curated_read=read,
incoming_read=read,
incoming_write=write,
)
def authenticate(
self,
token: str,
) -> AuthenticationInfo:
user_teams = self._get_teams_for_user(token)
if self.team not in user_teams:
msg = f'token user is not member of team `{self.team}`'
raise RemoteAuthenticationError(
status=HTTP_401_UNAUTHORIZED,
message=msg,
)
organization = self._get_organization(token)
user_info = self._get_user(token)
if self.repository is not None:
organization_teams = self._get_teams_for_repo(
token,
self.organization,
self.repository,
)
else:
organization_teams = self._get_teams_for_organization(
token,
self.organization,
)
# Check that the configured team exists
team = organization_teams.get(self.team)
if not team:
if self.repository is not None:
msg = f'team `{self.team}` has no access to repository `{self.repository}`'
else:
msg = f'organization `{self.organization}` has no team `{self.team}`'
raise RemoteAuthenticationError(
status=HTTP_401_UNAUTHORIZED,
message=msg,
)
# Get the repo.code permissions from the team definition
code_permissions = team['units_map'].get('repo.code')
if not code_permissions:
msg = f'no `repo.code`-unit defined for team `{self.team}` in organization {self.organization}'
raise RemoteAuthenticationError(
status=HTTP_401_UNAUTHORIZED,
message=msg,
)
return AuthenticationInfo(
token_permission=self._get_permissions(code_permissions),
user_id=user_info['email'],
incoming_label=
f'forgejo-team-{organization["name"]}-{team["name"]}'
if self.label_type == 'team'
else f'forgejo-user-{user_info["login"]}',
)

View file

@ -3,6 +3,7 @@ from __future__ import annotations
import dataclasses
import enum
import hashlib
import logging
from functools import partial
from pathlib import Path
from typing import (
@ -21,25 +22,25 @@ from pydantic import (
from yaml.scanner import ScannerError
from dump_things_service import (
HTTP_400_BAD_REQUEST,
HTTP_401_UNAUTHORIZED,
HTTP_404_NOT_FOUND,
)
from dump_things_service.auth import AuthenticationError
from dump_things_service.backends.record_dir import RecordDirStore
from dump_things_service.backends.schema_type_layer import SchemaTypeLayer
from dump_things_service.backends.sqlite import (
SQLiteBackend,
)
from dump_things_service.backends.sqlite import SQLiteBackend
from dump_things_service.backends.sqlite import (
record_file_name as sqlite_record_file_name,
)
from dump_things_service.converter import get_conversion_objects
from dump_things_service.model import get_model_for_schema
from dump_things_service.store.model_store import ModelStore
from dump_things_service.token import TokenPermission
if TYPE_CHECKING:
import types
logger = logging.getLogger('uvicorn')
config_file_name = '.dumpthings.yaml'
token_config_file_name = '.token_config.yaml' # noqa: S105
@ -68,12 +69,6 @@ class CollectionDirConfig(BaseModel):
idfx: MappingMethod
class TokenPermission(BaseModel):
curated_read: bool = False
incoming_read: bool = False
incoming_write: bool = False
class TokenModes(enum.Enum):
READ_CURATED = 'READ_CURATED'
READ_COLLECTION = 'READ_COLLECTION'
@ -104,11 +99,25 @@ class BackendConfigSQLite(BaseModel):
schema: str
class ForgejoAuthConfig(BaseModel):
type: Literal['forgejo']
url: str
organization: str
team: str
label_type: Literal['team', 'user']
repository: str | None = None
class ConfigAuthConfig(BaseModel):
type: Literal['config'] = 'config'
class CollectionConfig(BaseModel):
default_token: str
curated: Path
incoming: Path | None = None
backend: BackendConfigRecordDir | BackendConfigSQLite | None = None
auth_sources: list[ForgejoAuthConfig | ConfigAuthConfig] = [ConfigAuthConfig()]
class GlobalConfig(BaseModel):
@ -126,11 +135,14 @@ class InstanceConfig:
curated_stores: dict = dataclasses.field(default_factory=dict)
incoming: dict = dataclasses.field(default_factory=dict)
zones: dict = dataclasses.field(default_factory=dict)
permissions: dict = dataclasses.field(default_factory=dict)
model_info: dict = dataclasses.field(default_factory=dict)
token_stores: dict = dataclasses.field(default_factory=dict)
schemas: dict = dataclasses.field(default_factory=dict)
conversion_objects: dict = dataclasses.field(default_factory=dict)
backend: dict = dataclasses.field(default_factory=dict)
auth_providers: dict = dataclasses.field(default_factory=dict)
tokens: dict = dataclasses.field(default_factory=dict)
mode_mapping = {
@ -234,6 +246,9 @@ class Config:
file_name: str = config_file_name,
) -> CollectionDirConfig:
config_path = path / file_name
if not config_path.exists():
msg = f'Config file does not exist: {config_path}'
raise ConfigError(msg)
try:
return CollectionDirConfig(
**yaml.load(config_path.read_text(), Loader=yaml.SafeLoader)
@ -267,12 +282,51 @@ def process_config_object(
order_by: list[str],
globals_dict: dict[str, Any],
):
from dump_things_service.auth.config import ConfigAuthenticationSource
from dump_things_service.auth.forgejo import ForgejoAuthenticationSource
instance_config = InstanceConfig(store_path=store_path)
instance_config.collections = config_object.collections
# Create a `ModelStore` (with currently fixed backend `RecordDirStore`) for
# the `curated`-dir in each collection.
for collection_name, collection_info in config_object.collections.items():
# Create the authentication providers
instance_config.auth_providers[collection_name] = []
auth_provider_list = []
# Check for multiple providers
for auth_provider in collection_info.auth_sources:
if auth_provider.type == 'config':
key = ('config',)
elif auth_provider.type == 'forgejo':
key = (
'forgejo',
auth_provider.url,
auth_provider.organization,
auth_provider.team,
auth_provider.label_type,
auth_provider.repository,
)
else:
msg = f'Unknown authentication provider type: {auth_provider.type}'
raise ConfigError(msg)
if key in auth_provider_list:
logger.warning(f'Ignoring duplicate authentication provider: {key}')
continue
auth_provider_list.append(key)
for auth_provider in auth_provider_list:
if auth_provider[0] == 'config':
instance_config.auth_providers[collection_name].append(
ConfigAuthenticationSource(
instance_config=instance_config,
collection=collection_name,
)
)
else:
instance_config.auth_providers[collection_name].append(
ForgejoAuthenticationSource(*auth_provider[1:])
)
# Set the default backend if not specified
backend = collection_info.backend or BackendConfigRecordDir(
type='record_dir+stl'
@ -297,6 +351,7 @@ def process_config_object(
instance_config.model_info[collection_name] = model, classes, model_var_name
globals_dict[model_var_name] = model
# Generate the curated stores
if backend_name == 'record_dir':
curated_store_backend = RecordDirStore(
root=store_path / collection_info.curated,
@ -333,24 +388,29 @@ def process_config_object(
if schema not in instance_config.conversion_objects:
instance_config.conversion_objects[schema] = get_conversion_objects(schema)
# Create a `ModelStore` for each token dir and fetch the permissions
for token_name, token_info in config_object.tokens.items():
entry = {'user_id': token_info.user_id, 'collections': {}}
instance_config.token_stores[token_name] = entry
for collection_name, token_collection_info in token_info.collections.items():
backend = instance_config.backend[collection_name]
entry['collections'][collection_name] = {}
# We do not create stores for tokens here, but leave it to the token
# authentication routine.
instance_config.token_stores[collection_name] = dict()
# Read info for tokens from the configuration
for token_name, token_info in config_object.tokens.items():
for collection_name, token_collection_info in token_info.collections.items():
if collection_name not in instance_config.tokens:
instance_config.tokens[collection_name] = dict()
permissions = get_permissions(token_collection_info.mode)
instance_config.tokens[collection_name][token_name] = {
'permissions': permissions,
'user_id': token_info.user_id,
'incoming_label': token_collection_info.incoming_label,
}
# There is only a token store if the token has incoming read- or
# incoming write-permissions. If a token store exists, we ensure
# that an incoming path is set and an incoming label exists.
if permissions.incoming_read or permissions.incoming_write:
# A token might be a pure curated read token, i.e., have the mode
# `READ_COLLECTION`. In this case, there might be no incoming store.
if (
collection_name in instance_config.incoming
and token_collection_info.mode
not in (
TokenModes.READ_CURATED,
TokenModes.NOTHING,
)
):
# Check that the incoming label is set for a token that has
# access rights to incoming records.
if not token_collection_info.incoming_label:
@ -358,105 +418,190 @@ def process_config_object(
raise ConfigError(msg)
if any(c in token_collection_info.incoming_label for c in ('\\', '/')):
msg = f'Incoming label for token `{token_name}` must not contain slashes or backslashes: `{token_collection_info.incoming_label}`'
msg = (
f'Incoming label for token `...` on collection '
f'`{collection_name}` must not contain slashes or '
f'backslashes: `{token_collection_info.incoming_label}`'
)
raise ConfigError(msg)
if collection_name not in instance_config.zones:
instance_config.zones[collection_name] = {}
instance_config.zones[collection_name][token_name] = (
token_collection_info.incoming_label
)
# Ensure that the store directory exists
store_dir = (
store_path
/ instance_config.incoming[collection_name]
/ token_collection_info.incoming_label
)
store_dir.mkdir(parents=True, exist_ok=True)
backend_name, extension = get_backend_and_extension(backend.type)
if backend_name == 'record_dir':
mapping_function = instance_config.curated_stores[
collection_name
].backend.pid_mapping_function
token_store_backend = RecordDirStore(
root=store_dir,
pid_mapping_function=mapping_function,
suffix=instance_config.curated_stores[
collection_name
].backend.suffix,
order_by=order_by,
if collection_name not in instance_config.incoming:
msg = (
'Incoming location not defined for collection '
f'`{collection_name}`, which has at least one token '
f'with write access'
)
token_store_backend.build_index_if_needed(
schema=instance_config.schemas[collection_name]
)
elif backend_name == 'sqlite':
token_store_backend = SQLiteBackend(
db_path=store_dir / sqlite_record_file_name,
)
else:
msg = f'Unsupported backend `{collection_info.backend.type}` for collection `{collection_name}`.'
raise ConfigError(msg)
if extension == 'stl':
token_store_backend = SchemaTypeLayer(
backend=token_store_backend,
schema=instance_config.schemas[collection_name],
)
# Check that default tokens are defined
for collection_name, collection_info in config_object.collections.items():
if collection_info.default_token not in instance_config.tokens[collection_name]:
msg = f'Unknown default token: `{collection_info.default_token}`'
raise ConfigError(msg)
token_store = ModelStore(
schema=instance_config.schemas[collection_name],
backend=token_store_backend,
)
entry['collections'][collection_name]['store'] = token_store
entry['collections'][collection_name]['permissions'] = get_permissions(
token_collection_info.mode
)
return instance_config
def create_token_store(
instance_config: InstanceConfig,
collection_name: str,
store_dir: Path,
) -> ModelStore:
schema_uri = instance_config.schemas[collection_name]
# We get the backend information from the curated store
backend_type = instance_config.backend[collection_name].type
backend_name, extension = get_backend_and_extension(backend_type)
backend = instance_config.curated_stores[collection_name].backend
if backend_name == 'record_dir':
# The configuration routines have read the backend configuration of the
# curated store from disk and stored it in `instance_config`. We fetch
# it from there.
if extension == 'stl':
backend = backend.backend
token_store = create_record_dir_token_store(
store_dir=store_dir,
order_by=backend.order_by,
schema_uri=instance_config.schemas[collection_name],
mapping_function=backend.pid_mapping_function,
suffix=backend.suffix,
)
elif backend_name == 'sqlite':
token_store = create_sqlite_token_store(
store_dir=store_dir,
order_by=backend.order_by,
)
else:
# This should not happen because we base our decision on already
# existing backends.
msg = f'Unsupported backend type: `{backend_type}`.'
raise ConfigError(msg)
if extension == 'stl':
token_store = SchemaTypeLayer(backend=token_store, schema=schema_uri)
return ModelStore(backend=token_store, schema=schema_uri)
def create_record_dir_token_store(
store_dir: Path,
order_by: list[str],
schema_uri: str,
mapping_function: Callable,
suffix: str,
) -> RecordDirStore:
store_backend = RecordDirStore(
root=store_dir,
pid_mapping_function=mapping_function,
suffix=suffix,
order_by=order_by,
)
store_backend.build_index_if_needed(schema=schema_uri)
return store_backend
def create_sqlite_token_store(
store_dir: Path,
order_by: list[str],
) -> SQLiteBackend:
return SQLiteBackend(
db_path=store_dir / sqlite_record_file_name,
order_by=order_by,
)
def check_collection(
instance_config: InstanceConfig,
collection: str,
):
if collection not in instance_config.collections:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail=f'No such collection: "{collection}".',
)
def get_backend_and_extension(backend_type: str) -> tuple[str, str]:
elements = backend_type.split('+')
return (elements[0], elements[1]) if len(elements) > 1 else (elements[0], '')
def get_token_store(
instance_config: InstanceConfig, collection_name: str, token: str
instance_config: InstanceConfig,
collection_name: str,
token: str
) -> tuple[ModelStore, TokenPermission] | tuple[None, None]:
if collection_name not in instance_config.curated_stores:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail=f'No such collection: "{collection_name}".',
)
if token not in instance_config.token_stores:
raise HTTPException(status_code=HTTP_401_UNAUTHORIZED, detail='Invalid token.')
check_collection(instance_config, collection_name)
token_store = None
if collection_name not in instance_config.token_stores[token]['collections']:
# This token is not configured for the requested collection, return
# empty permissions.
return token_store, TokenPermission()
# Check whether a store for this collection and token does already exist
store_info = instance_config.token_stores[collection_name].get(token)
if store_info:
return store_info
token_collection_info = instance_config.token_stores[token]['collections'][
collection_name
]
permissions = token_collection_info['permissions']
if permissions.incoming_write or permissions.incoming_read:
token_store = token_collection_info.get('store')
if not token_store:
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST,
detail=f'Configuration does not define an incoming store for token "{token}" in collection "{collection_name}".',
# Try to authenticate the token with the authentication providers that
# are associated with the collection.
auth_info = None
for auth_provider in instance_config.auth_providers[collection_name]:
try:
auth_info = auth_provider.authenticate(token)
break
except AuthenticationError:
logger.debug(
'Authentication provider %s could not authenticate token %s.',
auth_provider,
token,
)
continue
if not auth_info:
raise HTTPException(
status_code=HTTP_401_UNAUTHORIZED,
detail='Invalid token for collection ' + collection_name,
)
permissions = auth_info.token_permission
# If the token has no incoming-read or incoming-write permissions, we do not
# need to create a store.
if not permissions.incoming_read and not permissions.incoming_write:
instance_config.token_stores[collection_name][token] = None
return None, permissions
# Check whether the collection has an incoming definition
incoming = instance_config.incoming.get(collection_name)
if not incoming:
raise HTTPException(
status_code=HTTP_401_UNAUTHORIZED,
detail='No incoming area for collection ' + collection_name
)
store_dir = instance_config.store_path / incoming / auth_info.incoming_label
store_dir.mkdir(parents=True, exist_ok=True)
token_store = create_token_store(
instance_config=instance_config,
collection_name=collection_name,
store_dir=store_dir,
)
instance_config.token_stores[collection_name][token] = (
token_store,
permissions,
)
return token_store, permissions
def get_default_token_name(instance_config: InstanceConfig, collection: str) -> str:
if collection not in instance_config.collections:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND, detail=f'No such collection: {collection}'
)
def get_default_token_name(
instance_config: InstanceConfig,
collection: str
) -> str:
check_collection(instance_config, collection)
return instance_config.collections[collection].default_token
@ -466,9 +611,7 @@ def join_default_token_permissions(
collection: str,
) -> TokenPermission:
default_token_name = instance_config.collections[collection].default_token
default_token_permissions = instance_config.token_stores[default_token_name][
'collections'
][collection]['permissions']
default_token_permissions = instance_config.tokens[collection][default_token_name]['permissions']
result = TokenPermission()
result.curated_read = (
permissions.curated_read | default_token_permissions.curated_read
@ -496,7 +639,7 @@ def get_zone(
if token not in instance_config.zones[collection]:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail=f'No incoming zone defined for collection: {collection}',
detail=f'Missing incoming_label for given token in collection: {collection}',
)
return instance_config.zones[collection][token]
@ -506,11 +649,7 @@ def get_conversion_objects_for_collection(
collection_name: str,
) -> dict:
"""Get the conversion objects for the given collection."""
if collection_name not in instance_config.schemas:
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST,
detail=f'No such collection: {collection_name}',
)
check_collection(instance_config, collection_name)
return instance_config.conversion_objects[instance_config.schemas[collection_name]]
@ -518,9 +657,5 @@ def get_model_info_for_collection(
instance_config: InstanceConfig,
collection_name: str,
) -> tuple[types.ModuleType, dict[str, Any], str]:
if collection_name not in instance_config.model_info:
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST,
detail=f'No such collection: {collection_name}',
)
check_collection(instance_config, collection_name)
return instance_config.model_info[collection_name]

View file

@ -58,7 +58,6 @@ from dump_things_service.config import (
InstanceConfig,
get_default_token_name,
get_token_store,
get_zone,
join_default_token_permissions,
process_config,
)
@ -159,7 +158,7 @@ try:
globals_dict=globals(),
)
except ConfigError:
logger.exception(
uvicorn_logger.exception(
'ERROR: invalid configuration file at: `%s`',
config_path,
)
@ -274,7 +273,7 @@ def store_record(
stored_records = store.store_object(
obj=record,
submitter=g_instance_config.token_stores[token]['user_id'],
submitter=g_instance_config.tokens[collection][token]['user_id'],
)
if input_format == Format.ttl:

View file

@ -3,6 +3,8 @@ from __future__ import annotations
import dataclasses # noqa F401 -- used by generated code
import importlib
import logging
import random
import string
import subprocess
import tempfile
from itertools import count
@ -11,6 +13,7 @@ from typing import (
TYPE_CHECKING,
Any,
)
from urllib.parse import urlparse
import annotated_types # noqa F401 -- used by generated code
import pydantic # noqa F401 -- used by generated code
@ -41,8 +44,15 @@ _schema_view_cache = {}
def build_model(
source_url: str,
) -> Any:
parse_result = urlparse(source_url)
schema_name = Path(parse_result.path).stem
with tempfile.TemporaryDirectory() as temp_dir:
module_name = f'model_{next(serial_number)}'
random_suffix = ''.join(
random.choices(string.ascii_letters + string.digits, k=10)
)
module_name = f'model_{next(serial_number)}_{schema_name}_{random_suffix}'
definition_file = Path(temp_dir) / 'definition.yaml'
definition_file.write_text(read_url(source_url))
subprocess.run(

View file

@ -25,7 +25,7 @@ schema_path = Path(__file__).parent / 'testschema.yaml'
# The global configuration file, all collections and
# staging areas share the same directories. All token
# staging areas share the same directories. All tokens
# of the same collection share an "incoming_label".
global_config_text = f"""
type: collections
@ -35,28 +35,59 @@ collections:
default_token: basic_access
curated: {curated}/collection_1
incoming: {incoming}/collection_1
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
auth_sources:
- type: config
collection_2:
default_token: basic_access
curated: {curated}/collection_2
incoming: {incoming}/collection_2
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
collection_3:
default_token: basic_access
curated: {curated}/collection_3
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
collection_4:
default_token: basic_access
curated: {curated}/collection_4
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
collection_5:
default_token: basic_access
curated: {curated}/collection_5
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
collection_6:
default_token: basic_access
curated: {curated}/collection_6
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
collection_7:
default_token: basic_access
curated: {curated}/collection_7
backend:
type: record_dir+stl
schema: {schema_path}
idfx: digest_md5
collection_8:
default_token: basic_access
curated: {curated}/collection_8
incoming: incoming_8
backend:
type: sqlite
schema: {schema_path}
@ -64,6 +95,10 @@ collections:
default_token: basic_access
curated: {curated}/collection_dlflatsocial-1
incoming: {incoming}/collection_dlflatsocial-1
backend:
type: record_dir+stl
schema: https://concepts.datalad.org/s/flat-social/unreleased.yaml
idfx: digest_md5
collection_dlflatsocial-2:
default_token: basic_access
curated: {curated}/collection_dlflatsocial-2
@ -183,7 +218,7 @@ tokens:
collections:
collection_8:
mode: WRITE_COLLECTION
incoming_label: in_token_8
incoming_label: test_user_8
"""

View file

@ -0,0 +1,75 @@
from __future__ import annotations
import json
import pytest
from dump_things_service.auth.forgejo import ForgejoAuthenticationSource
from dump_things_service.token import TokenPermission
user_1 = {
'id': 1,
'login': 'user_1',
'email': 'user_1@example.com',
'username': 'user_1',
'@type': 'user',
}
org_1 = {
'id': 1,
'name': 'org_1',
'@type': 'org'
}
repo_1 = {
'id': 3,
'owner': user_1,
'name': 'repo_1',
}
team_template = """{{
"id": {id},
"name": "team_{id}",
"units_map": {{
"repo.code": "read"
}},
"@type": "team"
}}
"""
team_1 = json.loads(team_template.format(id=1))
team_2 = json.loads(team_template.format(id=2))
def setup_http_server(http_server) -> None:
http_server.expect_request('/api/v1/user').respond_with_json(user_1)
http_server.expect_request('/api/v1/user/teams').respond_with_json([team_1])
http_server.expect_request('/api/v1/orgs/org_1').respond_with_json(org_1)
http_server.expect_request('/api/v1/orgs/org_1/teams').respond_with_json([team_1, team_2])
http_server.expect_request('/api/v1/repos/org_1/repo_1/teams').respond_with_json([team_1, team_2])
@pytest.mark.parametrize('repository', ['repo_1', None])
@pytest.mark.parametrize('label_type', ['user', 'team'])
def test_forgejo_auth_team(httpserver, label_type, repository):
setup_http_server(httpserver)
forgejo_auth_source = ForgejoAuthenticationSource(
api_url=httpserver.url_for('/api/v1'),
organization='org_1',
team='team_1',
label_type=label_type,
repository=repository,
)
r = forgejo_auth_source.authenticate(token='something')
if label_type == 'team':
assert r.incoming_label == 'forgejo-team-org_1-team_1'
else:
assert r.incoming_label == 'forgejo-user-user_1'
assert r.token_permission == TokenPermission(
curated_read=True,
incoming_read=True,
incoming_write=False,
)
assert r.user_id == 'user_1@example.com'

View file

@ -1,20 +1,15 @@
from pathlib import Path
import pytest
import yaml
from fastapi import HTTPException
from pydantic import ValidationError
from yaml.scanner import ScannerError
from dump_things_service.config import (
ConfigError,
GlobalConfig,
get_token_store,
get_zone,
process_config,
process_config_object,
)
from dump_things_service.tests.create_store import create_store
def test_scanner_error_detection(tmp_path):
@ -36,7 +31,6 @@ def test_structure_error_detection(tmp_path):
def test_missing_incoming_detection(tmp_path):
schema_path = Path(__file__).parent / 'testschema.yaml'
config_object = GlobalConfig(
**yaml.load(
"""
@ -59,17 +53,6 @@ tokens:
)
)
create_store(
root_dir=tmp_path,
config=config_object,
per_collection_info={
'collection_1': (str(schema_path), 'digest-md5'),
},
)
global_dict = {}
instance_config = process_config_object(tmp_path, config_object, [], global_dict)
with pytest.raises(HTTPException):
get_token_store(instance_config, 'collection_1', 'basic_access')
with pytest.raises(HTTPException):
get_zone(instance_config, 'collection_1', 'basic_access')
with pytest.raises(ConfigError):
process_config_object(tmp_path, config_object, [], global_dict)

View file

@ -6,7 +6,7 @@ from copy import copy
from pathlib import Path
from typing import TYPE_CHECKING
import pytest # noqa F401
import pytest # F401
from dump_things_service import HTTP_200_OK
from dump_things_service.patches import (

View file

@ -0,0 +1,7 @@
from pydantic import BaseModel
class TokenPermission(BaseModel):
curated_read: bool = False
incoming_read: bool = False
incoming_write: bool = False

View file

@ -104,6 +104,7 @@ extra-dependencies = [
"httpx",
"pytest",
"pytest-cov",
"pytest-httpserver",
]
[tool.hatch.envs.tests.scripts]