I’ve always been intrigued by online code execution platforms like Leetcode & Hackerrank. I decided to build a service to understand what goes into the code execution functionality of these platforms.
I’ll mainly discuss the low-level design and implementation in Java and Spring Boot.
Object Model
- The
Codeclass has arawCodestring field that takes the code from the user. - The
CodeResultclass has 3 fields-stdoutfor the standard output stream,stderrfor the standard error stream, andexceptionsfor Docker and API exceptions. Additionally, it includes a builder for the convenience of constructing the object from streaming responses of the Docker daemon.Note- In languages like Python, both
stdoutandstderrcan have data when an error occurs after something is printed. This is due to its interpreted nature. - The
DockerContainerclass builds on top of theContainerclass provided by the docker-java library. It adds aremovalPriorityenum field to handle long-running containers, which I’ll discuss further. - The
ImageInfoclass stores the details of docker images to use for various languages, and theLanguageInfoclass is a helper for aGETendpoint that returns the supported programming languages.
Endpoints
There are three POST endpoints that all take the code as a string in the request body.
The service supports Java, Python and Javascript and can easily extend to other languages. A GET
endpoint /info returns the supported languages and their versions.
@RestController
class ExecutionController {
private final CodeExecutionServiceFactory codeExecutionFactory;
@PostMapping(value = "/python")
public CodeResult executePythonCode(@RequestBody Code code) throws CodeSizeLimitException,
TimeLimitException, ContainerNotCreatedException {
return codeExecutionFactory.executeCode(Constants.EXECUTE_PYTHON, code.getRawCode());
}
@GetMapping(value = "/info")
public List<LanguageInfo> getSupportedLanguages() {
return Arrays.stream(ImageInfo.values()).map(ImageInfo::getLanguageInfo)
.sorted(Comparator.comparing(LanguageInfo::getLanguageName)).toList();
}
}
The CodeExecutionServiceFactory class follows the Factory Design Pattern and returns
the CodeExecutionService bean associated with the constant that’s passed as the
first argument of the executeCode method (Python above).
Service Classes
The CodeExecutionService is an abstract class. It contains various abstract methods that are overridden
in programming language-specific child classes. The abstract methods are responsible for different
containers and code execution commands required by different languages.
public CodeResult executeCode(String rawCode) throws CodeSizeLimitException,
TimeLimitException, ContainerNotCreatedException {
/* Get code size by converting the string to bytes
and throw an exception if it exceeds 1MB */
var codeSize = codeSizeMb(rawCode);
if (codeSize > Constants.CODE_SIZE_LIMIT) {
throw new CodeSizeLimitException(Constants.CODE_SIZE_LIMIT_EXCEPTION);
}
// Get the container type from the derived class
var container = containerManagerService.createContainer(getContainerType());
containerManagerService.startContainer(container);
// Escape any tokens before sending the code to execute to the container
return executeCodeInContainer(container, getEscapeFunc().apply(rawCode));
}
Note- The inheritance in this case is only one level deep as the service is simple. In more complex scenarios, composability with interfaces using language-specific strategies will be a better choice.
The ContainerManager class is responsible for the lifecycle of the container. It contains methods for
creating, starting, removing and increasing the removal priority of containers. It also contains an
asynchronous method for removing ghost containers, something I’ll discuss further.
public DockerContainer createContainer(String language) throws ContainerNotCreatedException {
// Create a language-specific container
var container = containerCreatorServiceFactory.createContainer(language);
/* Add it to the container priority queue (synchronize as
multiple request threads access the queue) */
synchronized(containerQueue) {
containerQueue.offer(container);
}
return container;
}
The initial design for removing containers involved querying the docker daemon for the running containers
and removing those that had been running longer than a set duration. It was an async method that executed every 5
seconds using the @Scheduled annotation from Spring.
@Scheduled(fixedRate = 5, timeUnit = TimeUnit.SECONDS)
public void terminateContainers() {
var currentTime = System.currentTimeMillis() / 1000;
var containers = client.listContainersCmd().exec();
// Bottleneck (Querying the daemon and sorting by created time, every 5 seconds)
containers.sort(Comparator.comparingLong(Container::getCreated));
for (var container: containers) {
var timeDiffInSeconds = currentTime - container.getCreated();
if (timeDiffInSeconds >= 30) {
terminateContainer(container.getId());
} else {
break;
}
}
}
However, queries to the docker daemon are performed over HTTP via loopback. Since all the containers are
created through the application, a better option is to keep a pool of containers within the application. This
way you can instruct the daemon to terminate the containers with the containerId without
querying for the running containers every time.
To do this, we use a priority queue called containerQueue. It prioritizes containers with a high removal
priority followed by containers that have been running for a longer duration.
@Bean
public Queue<DockerContainer> createContainerQueue() {
// enum RemovalPriority { HIGH, LOW }
return new PriorityQueue<>(Comparator.comparing(DockerContainer::getRemovalPriority)
.thenComparing(c -> c.getContainer().getCreated()));
}
The removal priority is an enum that has two values HIGH and LOW. The priority is increased from LOW
to HIGH for long-running containers when the code doesn’t execute in the set timeout duration for each
language like infinite loops.
var result = false;
// Run the code and wait for it's execution for timeout seconds (language dependent)
try {
result = client.execStartCmd(execCreateCmdResponse.getId()).exec(resultCallbackTemplate)
.awaitCompletion(timeout, TimeUnit.SECONDS);
} catch (InterruptedException e) {
codeResultBuilder.appendExceptions(e.getMessage());
}
// Increase removal priority and throw an exception
if (!result) {
containerManagerService.increaseRemovalPriority(container);
throw new TimeLimitException(String.format(Constants.TIME_LIMIT_EXCEPTION, timeout));
}
One way of writing the removal method would be to take a lock on the containerQueue, identify all the
containers that have been running for 30 seconds, and scheduling them for removal. However, this method
maintains the lock on the containerQueue throughout, making it inaccessible for other requests to add
containers.
The stopAndRemoveContainer method is asynchronous and returns a CompletableFuture<Boolean>, true if the removal was
successful and false otherwise. Performing the termination synchronously would’ve led to a delayed API
response, as docker containers take a while to shut down.
/* Bottleneck (The container queue is locked throughout, it'll
block the addition of new containers to the queue) */
synchronized (containerQueue) {
while (!containerQueue.isEmpty()) {
var container = containerQueue.peek();
var timeDiffInSeconds = currentTime - container.getContainer().getCreated();
if (timeDiffInSeconds >= 30) {
containerQueue.poll();
containerRemovalService.stopAndRemoveContainer(container.getContainer()).thenAccept(isRemoved -> {
if (isRemoved) {
// Increment the available containers of that type
containerCreatorServiceFactory.incrementAvailableContainers(container);
} else {
// If removal fails, add it back to the queue to try again
containerQueue.offer(container);
}
});
} else {
break;
}
}
}
A better way of writing the same code would be to get the lock on the containerQueue, extract some
containers that have exceeded the time limit, release the lock and then try removing these containers
with a released lock. This helps free up the containerQueue for other operations like the addition of new
containers to the queue for other requests.
var containersToRemove = new ArrayList<DockerContainer>();
// Synchronize (take a lock) on containerQueue
synchronized (containerQueue) {
int numContainersToRemove = Math.min(containerQueue.size(), 3);
for (int i = 0; i < numContainersToRemove; i++) {
var container = containerQueue.peek();
var timeDiffInSeconds = currentTime - container.getContainer().getCreated();
// Add to removal list if time difference exceeds 30 seconds
if (timeDiffInSeconds >= 30) {
containersToRemove.add(containerQueue.poll());
} else {
break;
}
}
}
// Lock released (containerQueue is free for access by other requests)
for (var container: containersToRemove) {
// Try removing containers
containerRemovalService.stopAndRemoveContainer(container.getContainer()).thenAccept(isRemoved -> {
if (isRemoved) {
containerCreatorServiceFactory.incrementAvailableContainers(container);
} else {
// If removal fails, synchronize, add to containerQueue again
synchronized (containerQueue) {
containerQueue.offer(container);
}
}
});
}
A scenario where a disparity might arise between the containers in the containerQueue and those within the
Docker daemon occurs when the application crashes while there are pending removals of some containers.
When the Spring Boot application restarts, the containerQueue will be empty and containers will keep running
within the daemon.
To prevent this, we include an additional method called stopAndRemoveGhostContainers that
runs every 10 minutes. I’ve used the term ghost containers as they keep running in the background and hogging
resources without any use. This method queries the Docker daemon for running containers and terminates
those that have been running for longer than 10 minutes.
@Scheduled(fixedRate = 10, timeUnit = TimeUnit.MINUTES)
public void stopAndRemoveGhostContainers() {
var currentTime = (double) System.currentTimeMillis() / 1000.0;
var containers = client.listContainersCmd().exec();
containers.sort(Comparator.comparing(Container::getCreated));
for (var container: containers) {
var timeDiffInMinutes = (currentTime - container.getCreated()) / 60.0;
if (timeDiffInMinutes >= 10) {
containerRemovalService.stopAndRemoveContainer(container);
} else {
break;
}
}
}
Exceptions
Exceptions are handled through a @RestControllerAdvice which centralizes exception handling. However, one
shortcoming with this approach is that all the methods in the code flow have a throws statement in them.
This is fine as long as the types of exceptions are fewer but can quickly become unmanageable. An alternative
is to pass a result object in the service classes or store it in the Spring context and populate it
whenever an error occurs in the code flow.
@RestControllerAdvice
public class ApplicationExceptionHandler {
@ExceptionHandler({CodeSizeLimitException.class, TimeLimitException.class})
public ResponseEntity<CodeResult> handleLimitExceptions(Exception exception) {
CodeResult codeResult = new CodeResult.Builder().appendExceptions(exception.getMessage()).build();
return new ResponseEntity<>(codeResult, HttpStatus.BAD_REQUEST);
}
@ExceptionHandler({ContainerNotCreatedException.class})
public ResponseEntity<CodeResult> handleContainerNotCreatedException(Exception exception) {
CodeResult codeResult = new CodeResult.Builder().appendExceptions(exception.getMessage()).build();
return new ResponseEntity<>(codeResult, HttpStatus.SERVICE_UNAVAILABLE);
}
}
Conclusion
While a full-fledged platform like Leetcode involves tracking users, code execution metrics, stringent security mechanisms, leaderboards and several other functionalities, this service captures the essence of the core functionality well.
You can check out the full code here Remote Code Execution Service.