Authors
Chitti Srinivasa Phani, Sedar Olmez and Koichi Yokota, Antifragility Research Group, United Kingdom
Abstract
User-authored data-processing scripts are widely used for ETL workflows due to their flexibility and ease of development, but they frequently encode implicit assumptions about local file systems, execution order, and runtime context. These assumptions make such scripts fragile when migrated to cloud environments, where differences in storage semantics, resource constraints, and execution models can lead to silent failures or incorrect behaviour. Existing migration approaches typically rely on manual refactoring or narrowly scoped tooling, limiting scalability and reliability. This paper presents Platform Code Converter (PCC), a compiler-inspired middleware for transforming unstructured ETL scripts into portable, cloud-ready artefacts. PCC recovers operational semantics using a language-neutral Intent Grammar and a lightweight Intermediate Representation (IR), from which it derives backend-specific execution policies for chunking, batching, parallelism, and storage access. A tiered transformation engine applies correctness preserving rewrites, augmented by LLM-assisted, correctness-preserving refinement when deter manistic rewriting is insufficient, while conservatively preserving code under semantic uncertainty. We evaluate PCC on forty heterogeneous ETL workloads, spanning synthetic patterns and real production scripts; across these, PCC applies 110 deterministic transformations, migrates over 300 file paths to cloud-safe representations, and validates all generated artefacts at execution time. These results demonstrate that intent-guided compilation provides a practical and reliable foundation for cloud migration of unstructured ETL code.
Keywords
ETL migration, cloud computing, code transformation, intermediate representation