dlc2action.preprocessing
Preprocessing utilities for importing external features into DLC2Action.
This module provides helpers for converting raw video files into per-frame feature
arrays that are compatible with the ~dlc2action.data.input_store.LoadedFeaturesInputStore
(i.e. data_type="features" projects).
Example
Extract DINOv2 features from all .mp4 files in a folder and save them
alongside an existing DLC2Action project::
import dlc2action
dlc2action.get_visual_features(
video_folder="/path/to/videos",
video_suffix=".mp4",
output_folder="/path/to/features",
)
1# 2# Copyright 2020-present by A. Mathis Group and contributors. All rights reserved. 3# 4# This project and all its files are licensed under GNU AGPLv3 or later version. 5# A copy is included in dlc2action/LICENSE.AGPL. 6# 7"""Preprocessing utilities for importing external features into DLC2Action. 8 9This module provides helpers for converting raw video files into per-frame feature 10arrays that are compatible with the :class:`~dlc2action.data.input_store.LoadedFeaturesInputStore` 11(i.e. ``data_type="features"`` projects). 12 13Example 14------- 15Extract DINOv2 features from all ``.mp4`` files in a folder and save them 16alongside an existing DLC2Action project:: 17 18 import dlc2action 19 20 dlc2action.get_visual_features( 21 video_folder="/path/to/videos", 22 video_suffix=".mp4", 23 output_folder="/path/to/features", 24 ) 25 26""" 27 28from dlc2action.preprocessing.visual_encoders import get_visual_features 29 30__all__ = ["get_visual_features"]
292def get_visual_features( 293 video_folder: Union[str, Path], 294 video_suffix: str, 295 output_folder: Union[str, Path], 296 encoder: str = "dinov3", 297 model_name: Optional[str] = None, 298 device: Optional[str] = None, 299 clip_id: str = "ind0", 300 overwrite: bool = False, 301) -> None: 302 """Extract per-frame visual features from all videos in a folder. 303 304 Searches *video_folder* for every file whose name ends with *video_suffix*, 305 runs the selected visual encoder on each video, pads the output, and saves 306 the result as a ``.npy`` dictionary file compatible with 307 :class:`~dlc2action.data.input_store.LoadedFeaturesInputStore`. 308 309 The padding length is **model-specific** and is read automatically from the 310 encoder's :attr:`VisualEncoder.default_pad_frames` class attribute (e.g. 8 311 for :class:`DinoV3Encoder`). There is no user-facing ``pad_frames`` 312 parameter because the correct value is part of the encoder's definition. 313 314 The saved dictionary has the form:: 315 316 {"ind0": np.ndarray(shape=(T + encoder.default_pad_frames, D), dtype=float32)} 317 318 where ``T`` is the number of video frames and ``D`` is the encoder's feature 319 dimension, so it can be used directly in a project with 320 ``data_type="features"`` and ``feature_suffix=encoder.default_suffix``. 321 322 Parameters 323 ---------- 324 video_folder : str or Path 325 Directory that contains the input video files. 326 video_suffix : str 327 File extension (or any suffix) used to identify video files, e.g. 328 ``".mp4"`` or ``"_cropped.avi"``. 329 output_folder : str or Path 330 Directory where the ``.npy`` feature files will be saved. Created 331 automatically if it does not yet exist. 332 encoder : str, default ``"dinov3"`` 333 Name of the registered visual encoder to use. Call 334 :func:`list_encoders` to see all available options. 335 model_name : str, optional 336 Override the encoder's default HuggingFace model checkpoint. 337 device : str, optional 338 PyTorch device string (e.g. ``"cuda:0"``). Auto-detected when ``None``. 339 clip_id : str, default ``"ind0"`` 340 The clip / individual identifier used as the dictionary key inside the 341 saved ``.npy`` file. Change this if you need a different agent name. 342 overwrite : bool, default False 343 If ``True``, re-extract features even if an output file already exists. 344 345 Raises 346 ------ 347 ValueError 348 If *encoder* is not a registered encoder name. 349 RuntimeError 350 If no video files matching *video_suffix* are found in *video_folder*. 351 352 Examples 353 -------- 354 Extract DINOv3 features from all ``.mp4`` files:: 355 356 import dlc2action 357 358 dlc2action.get_visual_features( 359 video_folder="/path/to/videos", 360 video_suffix=".mp4", 361 output_folder="/path/to/features", 362 ) 363 364 Use a custom device and skip already-processed files:: 365 366 dlc2action.get_visual_features( 367 video_folder="/path/to/videos", 368 video_suffix=".mp4", 369 output_folder="/path/to/features", 370 device="cuda:1", 371 overwrite=False, 372 ) 373 374 """ 375 if encoder not in _ENCODER_REGISTRY: 376 raise ValueError( 377 f"Unknown encoder '{encoder}'. " 378 f"Available encoders: {list_encoders()}. " 379 "Use dlc2action.preprocessing.list_encoders() for an up-to-date list." 380 ) 381 382 video_folder = Path(video_folder) 383 output_folder = Path(output_folder) 384 output_folder.mkdir(parents=True, exist_ok=True) 385 386 # Collect matching video files 387 video_paths = sorted( 388 p for p in video_folder.iterdir() if p.name.endswith(video_suffix) 389 ) 390 if not video_paths: 391 raise RuntimeError( 392 f"No files ending with '{video_suffix}' were found in '{video_folder}'." 393 ) 394 395 # Instantiate the encoder once (model loading can be expensive) 396 enc: VisualEncoder = _ENCODER_REGISTRY[encoder]( 397 model_name=model_name, 398 device=device, 399 ) 400 401 for video_path in tqdm(video_paths, desc="Processing videos"): 402 # Derive output file name: {stem}{encoder.default_suffix} 403 stem = video_path.name[: -len(video_suffix)] 404 out_file = output_folder / (stem + enc.default_suffix) 405 406 if out_file.exists() and not overwrite: 407 print(f"Skipping '{video_path.name}' — output already exists.") 408 continue 409 410 features = enc.encode_video(video_path) # (T, D) 411 412 # Pad by copying the last frame enc.default_pad_frames times. 413 # The padding length is model-specific and defined on the encoder class. 414 if enc.default_pad_frames > 0: 415 padding = np.repeat(features[-1:], enc.default_pad_frames, axis=0) 416 features = np.concatenate([features, padding], axis=0) 417 418 # Save as a dictionary: {clip_id: array} — compatible with LoadedFeaturesInputStore 419 out_dict = {clip_id: features} 420 np.save(str(out_file), out_dict)
Extract per-frame visual features from all videos in a folder.
Searches video_folder for every file whose name ends with video_suffix,
runs the selected visual encoder on each video, pads the output, and saves
the result as a .npy dictionary file compatible with
~dlc2action.data.input_store.LoadedFeaturesInputStore.
The padding length is model-specific and is read automatically from the
encoder's VisualEncoder.default_pad_frames class attribute (e.g. 8
for DinoV3Encoder). There is no user-facing pad_frames
parameter because the correct value is part of the encoder's definition.
The saved dictionary has the form::
{"ind0": np.ndarray(shape=(T + encoder.default_pad_frames, D), dtype=float32)}
where T is the number of video frames and D is the encoder's feature
dimension, so it can be used directly in a project with
data_type="features" and feature_suffix=encoder.default_suffix.
Parameters
video_folder : str or Path
Directory that contains the input video files.
video_suffix : str
File extension (or any suffix) used to identify video files, e.g.
".mp4" or "_cropped.avi".
output_folder : str or Path
Directory where the .npy feature files will be saved. Created
automatically if it does not yet exist.
encoder : str, default "dinov3"
Name of the registered visual encoder to use. Call
list_encoders() to see all available options.
model_name : str, optional
Override the encoder's default HuggingFace model checkpoint.
device : str, optional
PyTorch device string (e.g. "cuda:0"). Auto-detected when None.
clip_id : str, default "ind0"
The clip / individual identifier used as the dictionary key inside the
saved .npy file. Change this if you need a different agent name.
overwrite : bool, default False
If True, re-extract features even if an output file already exists.
Raises
ValueError If encoder is not a registered encoder name. RuntimeError If no video files matching video_suffix are found in video_folder.
Examples
Extract DINOv3 features from all .mp4 files::
import dlc2action
dlc2action.get_visual_features(
video_folder="/path/to/videos",
video_suffix=".mp4",
output_folder="/path/to/features",
)
Use a custom device and skip already-processed files::
dlc2action.get_visual_features(
video_folder="/path/to/videos",
video_suffix=".mp4",
output_folder="/path/to/features",
device="cuda:1",
overwrite=False,
)