dlc2action.preprocessing

Preprocessing utilities for importing external features into DLC2Action.

This module provides helpers for converting raw video files into per-frame feature arrays that are compatible with the ~dlc2action.data.input_store.LoadedFeaturesInputStore (i.e. data_type="features" projects).

Example

Extract DINOv2 features from all .mp4 files in a folder and save them alongside an existing DLC2Action project::

import dlc2action

dlc2action.get_visual_features(
    video_folder="/path/to/videos",
    video_suffix=".mp4",
    output_folder="/path/to/features",
)
 1#
 2# Copyright 2020-present by A. Mathis Group and contributors. All rights reserved.
 3#
 4# This project and all its files are licensed under GNU AGPLv3 or later version.
 5# A copy is included in dlc2action/LICENSE.AGPL.
 6#
 7"""Preprocessing utilities for importing external features into DLC2Action.
 8
 9This module provides helpers for converting raw video files into per-frame feature
10arrays that are compatible with the :class:`~dlc2action.data.input_store.LoadedFeaturesInputStore`
11(i.e. ``data_type="features"`` projects).
12
13Example
14-------
15Extract DINOv2 features from all ``.mp4`` files in a folder and save them
16alongside an existing DLC2Action project::
17
18    import dlc2action
19
20    dlc2action.get_visual_features(
21        video_folder="/path/to/videos",
22        video_suffix=".mp4",
23        output_folder="/path/to/features",
24    )
25
26"""
27
28from dlc2action.preprocessing.visual_encoders import get_visual_features
29
30__all__ = ["get_visual_features"]
def get_visual_features( video_folder: Union[str, pathlib.Path], video_suffix: str, output_folder: Union[str, pathlib.Path], encoder: str = 'dinov3', model_name: Optional[str] = None, device: Optional[str] = None, clip_id: str = 'ind0', overwrite: bool = False) -> None:
292def get_visual_features(
293    video_folder: Union[str, Path],
294    video_suffix: str,
295    output_folder: Union[str, Path],
296    encoder: str = "dinov3",
297    model_name: Optional[str] = None,
298    device: Optional[str] = None,
299    clip_id: str = "ind0",
300    overwrite: bool = False,
301) -> None:
302    """Extract per-frame visual features from all videos in a folder.
303
304    Searches *video_folder* for every file whose name ends with *video_suffix*,
305    runs the selected visual encoder on each video, pads the output, and saves
306    the result as a ``.npy`` dictionary file compatible with
307    :class:`~dlc2action.data.input_store.LoadedFeaturesInputStore`.
308
309    The padding length is **model-specific** and is read automatically from the
310    encoder's :attr:`VisualEncoder.default_pad_frames` class attribute (e.g. 8
311    for :class:`DinoV3Encoder`).  There is no user-facing ``pad_frames``
312    parameter because the correct value is part of the encoder's definition.
313
314    The saved dictionary has the form::
315
316        {"ind0": np.ndarray(shape=(T + encoder.default_pad_frames, D), dtype=float32)}
317
318    where ``T`` is the number of video frames and ``D`` is the encoder's feature
319    dimension, so it can be used directly in a project with
320    ``data_type="features"`` and ``feature_suffix=encoder.default_suffix``.
321
322    Parameters
323    ----------
324    video_folder : str or Path
325        Directory that contains the input video files.
326    video_suffix : str
327        File extension (or any suffix) used to identify video files, e.g.
328        ``".mp4"`` or ``"_cropped.avi"``.
329    output_folder : str or Path
330        Directory where the ``.npy`` feature files will be saved.  Created
331        automatically if it does not yet exist.
332    encoder : str, default ``"dinov3"``
333        Name of the registered visual encoder to use.  Call
334        :func:`list_encoders` to see all available options.
335    model_name : str, optional
336        Override the encoder's default HuggingFace model checkpoint.
337    device : str, optional
338        PyTorch device string (e.g. ``"cuda:0"``).  Auto-detected when ``None``.
339    clip_id : str, default ``"ind0"``
340        The clip / individual identifier used as the dictionary key inside the
341        saved ``.npy`` file.  Change this if you need a different agent name.
342    overwrite : bool, default False
343        If ``True``, re-extract features even if an output file already exists.
344
345    Raises
346    ------
347    ValueError
348        If *encoder* is not a registered encoder name.
349    RuntimeError
350        If no video files matching *video_suffix* are found in *video_folder*.
351
352    Examples
353    --------
354    Extract DINOv3 features from all ``.mp4`` files::
355
356        import dlc2action
357
358        dlc2action.get_visual_features(
359            video_folder="/path/to/videos",
360            video_suffix=".mp4",
361            output_folder="/path/to/features",
362        )
363
364    Use a custom device and skip already-processed files::
365
366        dlc2action.get_visual_features(
367            video_folder="/path/to/videos",
368            video_suffix=".mp4",
369            output_folder="/path/to/features",
370            device="cuda:1",
371            overwrite=False,
372        )
373
374    """
375    if encoder not in _ENCODER_REGISTRY:
376        raise ValueError(
377            f"Unknown encoder '{encoder}'. "
378            f"Available encoders: {list_encoders()}. "
379            "Use dlc2action.preprocessing.list_encoders() for an up-to-date list."
380        )
381
382    video_folder = Path(video_folder)
383    output_folder = Path(output_folder)
384    output_folder.mkdir(parents=True, exist_ok=True)
385
386    # Collect matching video files
387    video_paths = sorted(
388        p for p in video_folder.iterdir() if p.name.endswith(video_suffix)
389    )
390    if not video_paths:
391        raise RuntimeError(
392            f"No files ending with '{video_suffix}' were found in '{video_folder}'."
393        )
394
395    # Instantiate the encoder once (model loading can be expensive)
396    enc: VisualEncoder = _ENCODER_REGISTRY[encoder](
397        model_name=model_name,
398        device=device,
399    )
400
401    for video_path in tqdm(video_paths, desc="Processing videos"):
402        # Derive output file name: {stem}{encoder.default_suffix}
403        stem = video_path.name[: -len(video_suffix)]
404        out_file = output_folder / (stem + enc.default_suffix)
405
406        if out_file.exists() and not overwrite:
407            print(f"Skipping '{video_path.name}' — output already exists.")
408            continue
409
410        features = enc.encode_video(video_path)  # (T, D)
411
412        # Pad by copying the last frame enc.default_pad_frames times.
413        # The padding length is model-specific and defined on the encoder class.
414        if enc.default_pad_frames > 0:
415            padding = np.repeat(features[-1:], enc.default_pad_frames, axis=0)
416            features = np.concatenate([features, padding], axis=0)
417
418        # Save as a dictionary: {clip_id: array} — compatible with LoadedFeaturesInputStore
419        out_dict = {clip_id: features}
420        np.save(str(out_file), out_dict)

Extract per-frame visual features from all videos in a folder.

Searches video_folder for every file whose name ends with video_suffix, runs the selected visual encoder on each video, pads the output, and saves the result as a .npy dictionary file compatible with ~dlc2action.data.input_store.LoadedFeaturesInputStore.

The padding length is model-specific and is read automatically from the encoder's VisualEncoder.default_pad_frames class attribute (e.g. 8 for DinoV3Encoder). There is no user-facing pad_frames parameter because the correct value is part of the encoder's definition.

The saved dictionary has the form::

{"ind0": np.ndarray(shape=(T + encoder.default_pad_frames, D), dtype=float32)}

where T is the number of video frames and D is the encoder's feature dimension, so it can be used directly in a project with data_type="features" and feature_suffix=encoder.default_suffix.

Parameters

video_folder : str or Path Directory that contains the input video files. video_suffix : str File extension (or any suffix) used to identify video files, e.g. ".mp4" or "_cropped.avi". output_folder : str or Path Directory where the .npy feature files will be saved. Created automatically if it does not yet exist. encoder : str, default "dinov3" Name of the registered visual encoder to use. Call list_encoders() to see all available options. model_name : str, optional Override the encoder's default HuggingFace model checkpoint. device : str, optional PyTorch device string (e.g. "cuda:0"). Auto-detected when None. clip_id : str, default "ind0" The clip / individual identifier used as the dictionary key inside the saved .npy file. Change this if you need a different agent name. overwrite : bool, default False If True, re-extract features even if an output file already exists.

Raises

ValueError If encoder is not a registered encoder name. RuntimeError If no video files matching video_suffix are found in video_folder.

Examples

Extract DINOv3 features from all .mp4 files::

import dlc2action

dlc2action.get_visual_features(
    video_folder="/path/to/videos",
    video_suffix=".mp4",
    output_folder="/path/to/features",
)

Use a custom device and skip already-processed files::

dlc2action.get_visual_features(
    video_folder="/path/to/videos",
    video_suffix=".mp4",
    output_folder="/path/to/features",
    device="cuda:1",
    overwrite=False,
)