Applying JEPAs to Geotech Data

The Idea

Joint Embedding Predictive Architectures (JEPAs), introduced by Yann LeCun's team, learn representations by predicting in latent space rather than pixel/token space. This makes them particularly interesting for structured sensor data where raw reconstruction is less meaningful.

I've been exploring whether JEPAs can learn useful representations from geotechnical monitoring data — time series from strain gauges, piezometers, and inclinometers installed in slopes and foundations.

Why Not Standard Approaches?

Traditional geotechnical analysis relies heavily on domain expertise:

Engineers manually set threshold alerts
Anomaly detection uses simple statistical methods
Cross-sensor correlations are analyzed case-by-case

Self-supervised learning could capture the latent structure in multi-sensor time series without requiring labeled anomaly datasets (which are expensive and rare in geotech).

Early Architecture

The current prototype uses a 1D JEPA with temporal masking:

jepa_geotech.py

class GeoJEPA(nn.Module):
    def __init__(self, sensor_dim, embed_dim=256, depth=6):
        super().__init__()
        self.context_encoder = TemporalTransformer(sensor_dim, embed_dim, depth)
        self.target_encoder = TemporalTransformer(sensor_dim, embed_dim, depth)
        self.predictor = nn.TransformerDecoder(
            nn.TransformerDecoderLayer(embed_dim, nheads=8),
            num_layers=3,
        )
 
    def forward(self, context_windows, target_windows):
        context_repr = self.context_encoder(context_windows)
        with torch.no_grad():
            target_repr = self.target_encoder(target_windows)
        predicted = self.predictor(context_repr, target_repr)
        return predicted, target_repr

python

Open Questions

This is still very much a seedling idea. Key questions I'm working through:

Temporal masking strategy — How much context to show vs. mask? Geotechnical data has much slower dynamics than video or audio.
Multi-sensor fusion — Should each sensor type get its own encoder, or should we learn joint embeddings from the start?
Evaluation — Without labels, how do we evaluate representation quality? Downstream tasks like anomaly detection and settlement prediction are candidates.
Data scale — Do we have enough data? One monitoring project might have 50-100 sensors over 2-3 years.

Next Steps

Train the prototype on a real dataset from a slope monitoring project
Compare learned representations against hand-crafted features
Test transfer learning: pretrain on one site, fine-tune on another

The training infrastructure uses torchwisdom for experiment management. If results look promising, the ONNX export pipeline from the TabLogs project could be adapted for edge deployment on monitoring stations.