Balancing Productivity and Cost in Cloud-Based Remote Desktop (Part Two)

The `stop-if-inactive.sh` Script Revised to Sustain Temporal Network Glitches

9 min readJul 23, 2024

The updated version of the stop-if-inactive.sh script saves me 1 to 2 hours of productivity daily. Here’s why.

Previously, the script used the Linux shutdown <timeout> command to stop inactive VM instances. This command caused issues with restoring disrupted network connections. Network disruptions occur 2-4 times daily, taking about 5 minutes each to recover. Due to the break in mental concentration, each network disruption resulted in about 30 minutes of productivity loss.

This publication follows up on my previous discussions about the AWS Multi-Account/Multi-Platform/Multi-User (MAPU) environment architecture and SSH protocol tunneling, focused on the cloud-based remote desktop operation at scale as a subtle balance between security, cost, and productivity.

The third article in this series introduced an automatic solution for stopping a virtual machine instance when no user activity is detected. While this solution worked on average, it had a flaw, mentioned above, leading to productivity loss in case of SSH session reconnect attempts performed by a client application, such as VSCode Remote.

In this publication, I will present a modification of the original solution, that eliminates this problem without significant cost increase. While the final solution is fairly simple, understanding the original problem and the rationale behind the selected solution requires a good understanding of how Linux and the SSH protocol work. Before diving into technical details, let me recall why remote desktops are essential.

Why Remote Desktops?

Performing all programming activities using a remote desktop is critical for anyone working with multiple software technologies. With the vast array of programming languages, runtime environments, and libraries, maintaining and updating them on a local computer is nearly impossible: one quickly loses mental control over multiple configurations.

This challenge leads engineers to stick with the single environment they initially started with, making them reluctant to experiment with alternatives.

While vendors might be content with this status quo, the industry suffers. This limitation also narrows the spectrum of hardware in use. For example, if a local laptop or desktop computer is based on Intel or AMD CPUs, supporting ARM-based CPUs might not be considered, despite their price, performance, and environmental advantages.

The same logic applies to various combinations of GPU, FPGA, secure enclaves, and other potentially advantageous hardware components. While additional hardware can usually be purchased, it will occupy extra space and become outdated too quickly.

With this understanding of overall motivation, let’s define the problem more precisely.

Problem Statement

Periodically, SSH sessions are disrupted due to network glitches or other reasons, causing VSCode to attempt reconnection. These attempts never succeed, forcing the user to wait until the VM instance enters a stopping state before trying to reconnect. Each disruption took about 5-10 minutes, which can seriously disrupt the user's workflow.

The root cause of this issue was that the stop-if-inactive.sh script invoked the Linux shutdown $SHUTDOWN_DELAY command when it detected a closed SSH connection. During the $SHUTDOWN_DELAY, the SSH daemon refused to accept incoming connections, and the script could not detect a new connection attempt and cancel the shutdown command.

As a result, when VSCode tried to reconnect, the initial attempt always failed, requiring the user to wait for the VM to stop and reconnect manually, leading to significant productivity loss.

Solution Overview

The solution overview is presented in the diagram below.

The only difference from the previous version is that thestop-if-inactive.shscript to stop the instance communicates with the AWS EC2 Service instead of using the Linux shutdown $SHUTDOWN_DELAYcommand.

Script Logic

Here is a simplified description of the script logic.

Read Configuration: Load configuration values from an external file.
Monitor Activity: Regularly check for file changes and user command submissions in the terminal.
Check Sessions: Determine if there are any active SSH or Tmux sessions.
Disable SSH: If no user activity is detected, close all SSH sessions and temporarily block new SSH connections.
Stop Instance: If inactivity persists, unblock new SSH connection requests, and stop the EC2 instance.

The new version of the stop-if-inactive.sh script is presented below:

#!/bin/bash
set -euo pipefail

# Read configuration values
CONFIG=$(cat /root/autoshutdown-configuration)
# Assuming the configuration file is in the format:
# TIMEOUT_NO_ACTIVITY=<value>
# POLLING_SLEEP=<value>
# NO_CONNECTION_RETRIES=<value>
# INSTANCE_ID=<value>
eval "$CONFIG"

USER=ec2-user
WATCH_DIR=/home/$USER

# Check for any file changes
has_any_file_changed() {
  if [[ $(find "$WATCH_DIR" -type f -mmin -$TIMEOUT_NO_ACTIVITY ! -path '*/vscode.lock' | wc -l) -gt 0 ]]; then
    return 0  # True - at least one file changed
  else
    logger -t autoshutdown "No file was changed during the last ${TIMEOUT_NO_ACTIVITY} minutes."
    return 1  # False - no files changed
  fi
}

# Check for user activity in terminal
was_any_command_typed() {
  local current_time=$(date +%s)
  local last_activity_time=$(ls -l --time-style=+%s /dev/pts | grep "$USER" | awk '{print $7}' | sort -nr | head -n1)
  
  if [[ -z "$last_activity_time" ]]; then
    logger -t autoshutdown "No terminal activity for the ${USER} user detected."
    return 1  # False - no user activity detected
  fi

  local time_diff=$((current_time - last_activity_time))
  
  if [[ $time_diff -lt $((TIMEOUT_NO_ACTIVITY * 60)) ]]; then
    return 0  # True - user was active recently
  else
    logger -t autoshutdown "No user activity during the last ${TIMEOUT_NO_ACTIVITY} minutes."
    return 1  # False - no recent user activity
  fi
}

# Check whether the user is active
is_user_active() {
  has_any_file_changed || was_any_command_typed
}

# Check whether any SSH session is active
is_ssh_active() {
  if ss -t -a | grep -q 'ESTAB.*:ssh'; then
    return 0  # True - at least one SSH session active
  else
    return 1  # False - no SSH session active
  fi
}

# Check whether any Tmux session is active
is_tmux_active() {
  tmux list-sessions > /dev/null 2>&1
  return $?
}

# Stop instance function
stop_instance() {
  logger -t autoshutdown "Stopping instance..."
  aws ec2 stop-instances --instance-ids $INSTANCE_ID
}

# Disable SSH to prevent VSCode reconnection attempts
disable_ssh() {
  logger -t autoshutdown "Killing the SSH process..."
  pkill -U $USER # close all SSH sessions
  iptables -A INPUT -p tcp --dport 22 -j REJECT # refuse input connections
  sleep "${POLLING_SLEEP}m"
  iptables -D INPUT -p tcp --dport 22 -j REJECT # remove the rule
}

main() {
  # Main monitoring loop
  local retries_counter=0
  while true; do
    sleep "${POLLING_SLEEP}m"
    if is_ssh_active; then
      if ! is_user_active; then
        logger -t autoshutdown "SSH Active. No user activity during the last $TIMEOUT_NO_ACTIVITY minutes detected."
        disable_ssh
        break
      fi
      retries_counter=0 # reset retries counter
    elif is_tmux_active; then
      if ! has_any_file_changed; then
        logger -t autoshutdown "Tmux Active. No file change during the last $TIMEOUT_NO_ACTIVITY minutes detected."
        break
      fi
    elif [[ $((++retries_counter)) > $NO_CONNECTION_RETRIES ]]; then
      logger -t autoshutdown "No SSH or Tmux active detected after the $NO_CONNECTION_RETRIES retries."
      break
    fi
  done
  stop_instance
}

main

The Solution Logic Under the Hood

While the stop-if-inactive.sh does not look very complex, the underlying logic of why it’s organized this way and not another is non-trivial.

To understand this logic, we need a deeper understanding of how the system works end-to-end.

In the case of VSCode Remote, five components potentially affect the SSH session stability:

Local ~/.ssh/config configuration file
Remote /etc/ssh/sshd_config configuration file on the VM instance
Local VSCode User Settings file
Local awsssh.sh script
Remote stop-if-inactive.sh script on the VM instance

Let’s briefly review each one.

Local `~/.ssh/config` Configuration File

This file may contain two configuration parameters that potentially affect the SSH session stability:

ServerAliveInterval: Specifies interval for sending keepalive messages to the server to detect if the server has crashed or the network has gone down.
ServerAliveCountMax: Sets the number of keepalive messages that may be sent without receiving any messages back from the server. When this threshold is reached the client will terminate the session.

Remote `/etc/ssh/sshd_config` Configuration File

This file may contain two configuration parameters conceptually parallel to those from the local ~/.ssh/config file:

ClientAliveInterval: Sets a timeout interval in seconds after which if no data has been received from the client, will send a message to request a response from the client.
ClientAliveCountMax: Sets the number of messages that may be sent without receiving any messages from the client. If this threshold is reached while client-alive messages are being sent, sshd will disconnect the client, terminating the session.

Local VSCode User Settings File

The safest way to modify this file is via the VSCode Setting menu. While the Remote.SSH contains multiple configuration parameters, we will focus only on two:

Connection Timeout: Specifies the timeout in seconds used for the SSH command that connects to the remote.
Max Reconnection Attempts: The maximum number of times to attempt reconnection. Use 0 to disallow reconnection after the first attempt, and null to use a maximum of 8.

To make the system work properly, setting the following VSCode preferences is critical:

and

Local `awsssh.sh` Script

This script was described in the previous publication. It is responsible for starting the VM instance if it is not running and initiating the SSH session over the AWS EIC Endpoint with this VM instance. If the VM instance is stopping, it will wait for it to stop completely and restart.

Remote `stop-if-inactive.sh` Script

The script is responsible for stopping the VM instance if no SSH or Tmux session is detected or when there is no visible user activity for a prolonged time (e.g. 30 minutes).

The original version of this script was described in the previous article and a brief description of the new version was presented above. Here, we will dive one inch deeper.

Having so many sources of potential problems can be overwhelming. If set incorrectly, each element can impact the SSH session stability. Moreover, different parameters should be configured in concert and support each other. While this script culminates the end-to-end solution, it relies on proper definitions made elsewhere.

Contrary to popular belief, keepalive message configurations within the ~/.ssh/config and /etc/ssh/sshd_config configuration files do not define effective system timeout due to user inactivity unless one of the sides or network crashes.

Sending the keepalive message, however, is an effective measure to prevent a premature timeout initiated by the AWS websockets client used for the SSH session tunneling over EIC.

While it could be configured on either side, I chose the server side to rely on the client-side configuration as little as possible. This configuration is provided via the /etc/ssh/sshd_config.d/keepalive.conf automatically included by etc/ssh/sshd_config.

The awsssh script needs enough time to wait for a VM instance to stop, restart it, and initiate SSH Tunneling over AWS EIC. In the worst case, this might take more than one minute. Therefore, setting the SSH connection timeout to 120 seconds (2 minutes) provides enough safety margin. While this parameter can also be specified in the local ~/.ssh/config configuration file, VSCode overrides via command line argument, and therefore VSCode User Settings is the right place for specification.

The second VSCode User Settings parameter “Max Reconnection Attempts” set to zero reflects more subtle logic. It tells, VSCode: “If you detect a network problem, try to reconnect automatically, but only once”. In the case of an accidental network glitch, that would be enough. If the VM instance is going to be stopped due to user inactivity, the stop-if-inactive.sh script will block this attempt and thus prevent the VM instance from being automatically restarted by the awsssh script.

To prevent stopping VM instances prematurely in the case of a temporal network glitch, the stop-if-inactive.sh script needs to check the SSH session availability more than once (normally, two attempts will be sufficient).

Overall, the stop-if-inactive.sh script logic ensures a smooth user experience even during temporary network glitches without extra charges for running VM instances that the user is not actively using but forgot to close the client.

Acknowledgments

While preparing this publication, I used several key tools. The article draft was prepared using the free Notion subscription.

I used the free version of Grammarly for grammar review, eliminating most basic spelling and grammar mistakes.

The stylistic finesse and coherence of the writing followed suggestions from the paid version of ChatGPT 4.0o, which was also instrumental in developing the new version of the stop-if-inactive.sh script. Though the process was not always smooth, including occasional interruptions, the final result was much better than anything I could develop alone.

With all these advanced techniques employed, it’s important to emphasize that concepts, solutions, and final decisions presented in this article are entirely my own, and I bear full responsibility for them.

Balancing Productivity and Cost in Cloud-Based Remote Desktop (Part Two)

The `stop-if-inactive.sh` Script Revised to Sustain Temporal Network Glitches

Why Remote Desktops?

Problem Statement

Solution Overview

Script Logic

The Solution Logic Under the Hood

Local `~/.ssh/config` Configuration File

Remote `/etc/ssh/sshd_config` Configuration File

Local VSCode User Settings File

Local `awsssh.sh` Script

Remote `stop-if-inactive.sh` Script

Acknowledgments

Written by Asher Sterkin

Responses (2)

Balancing Productivity and Cost in Cloud-Based Remote Desktop (Part Two)

The `stop-if-inactive.sh` Script Revised to Sustain Temporal Network Glitches

Why Remote Desktops?

Problem Statement

Solution Overview

Script Logic

The Solution Logic Under the Hood

Local ~/.ssh/config Configuration File

Remote /etc/ssh/sshd_config Configuration File

Local VSCode User Settings File

Local awsssh.sh Script

Remote stop-if-inactive.sh Script

Acknowledgments

Written by Asher Sterkin

Responses (2)

Local `~/.ssh/config` Configuration File

Remote `/etc/ssh/sshd_config` Configuration File

Local `awsssh.sh` Script

Remote `stop-if-inactive.sh` Script