The Idea
Joint Embedding Predictive Architectures (JEPAs), introduced by Yann LeCun's team, learn representations by predicting in latent space rather than pixel/token space. This makes them particularly interesting for structured sensor data where raw reconstruction is less meaningful.
I've been exploring whether JEPAs can learn useful representations from geotechnical monitoring data — time series from strain gauges, piezometers, and inclinometers installed in slopes and foundations.
Why Not Standard Approaches?
Traditional geotechnical analysis relies heavily on domain expertise:
- Engineers manually set threshold alerts
- Anomaly detection uses simple statistical methods
- Cross-sensor correlations are analyzed case-by-case
Self-supervised learning could capture the latent structure in multi-sensor time series without requiring labeled anomaly datasets (which are expensive and rare in geotech).
Early Architecture
The current prototype uses a 1D JEPA with temporal masking:
class GeoJEPA(nn.Module):
def __init__(self, sensor_dim, embed_dim=256, depth=6):
super().__init__()
self.context_encoder = TemporalTransformer(sensor_dim, embed_dim, depth)
self.target_encoder = TemporalTransformer(sensor_dim, embed_dim, depth)
self.predictor = nn.TransformerDecoder(
nn.TransformerDecoderLayer(embed_dim, nheads=8),
num_layers=3,
)
def forward(self, context_windows, target_windows):
context_repr = self.context_encoder(context_windows)
with torch.no_grad():
target_repr = self.target_encoder(target_windows)
predicted = self.predictor(context_repr, target_repr)
return predicted, target_reprpythonOpen Questions
This is still very much a seedling idea. Key questions I'm working through:
- Temporal masking strategy — How much context to show vs. mask? Geotechnical data has much slower dynamics than video or audio.
- Multi-sensor fusion — Should each sensor type get its own encoder, or should we learn joint embeddings from the start?
- Evaluation — Without labels, how do we evaluate representation quality? Downstream tasks like anomaly detection and settlement prediction are candidates.
- Data scale — Do we have enough data? One monitoring project might have 50-100 sensors over 2-3 years.
Next Steps
- Train the prototype on a real dataset from a slope monitoring project
- Compare learned representations against hand-crafted features
- Test transfer learning: pretrain on one site, fine-tune on another
The training infrastructure uses torchwisdom for experiment management. If results look promising, the ONNX export pipeline from the TabLogs project could be adapted for edge deployment on monitoring stations.