I'm working on a script to convert a script suite between languages. At a high level, it looks like:
parse_tree=parse_gmc(in_path) transformed=transform(parse_tree) output_python(transformed, out_path) success=True if python_compiles(out_path): known_artifacts=run_gmc(in_path, data_path) our_artifacts=run_python(out_path, data_path) for old, new in zip(known_artifacts,our_artifacts): success=success and fuzzy_equivalent(old, new) return success
where each function call can fail.
The issue is that the target script suite is about 10K files, which is about three hours if each task takes 1 second (an unreasonably low estimate), which isn't a useful turnaround time for the entire batch, so I'm looking for distributed horizontal scaling methods - I want to say "for all input files, run this pipeline on each of them, with no synchronization between pipelines, and either give me an error report from some pipeline stage or a build artifact."
I'd really like to cache artifacts between pipeline stages and runs - so if I'm running stages 1-4 on files A-E, and A dies at step 3 but all others succeed, then the next run only starts A at step 3.
Does Jenkins make sense for this workload? Can it be made to fit? Can my workload be made to fit Jenkins more easily?