流水线运行报错docker not found

流水线文件

include:
  - project: 'root/pipeline'
    file: 'docker.yml'
  - project: 'root/pipeline'
    file: 'ssh.yml'

test-docker:
  extends: .docker-build

test-ssh:
  extends: .ssh-command
  rules:
    - if: $CI_COMMIT_TAG
  variables:
    SSH_HOST: "***"
  before_script:
    - apk add openssh-client
    - chmod 400 $SSH_KEY
  script:
    - ssh -o StrictHostKeyChecking=no -i $SSH_KEY $SSH_USER@$SSH_HOST "
        if [[ $(docker ps -a |  grep "$CI_REGISTRY_IMAGE"  | awk '{print $1}') != "" ]]; then docker rm -f $(docker ps -a |  grep "$CI_REGISTRY_IMAGE"  | awk '{print $1}');else echo "container not found"; fi &&
        docker image prune -f  -a --filter "until=24h" &&
        docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY &&
        docker pull $CI_REGISTRY_IMAGE:${CI_COMMIT_TAG} &&
        docker run -d --rm -p 80:8080 $CI_REGISTRY_IMAGE:${CI_COMMIT_TAG}
      "

ssh.yml文件内容:

.ssh-command:
  stage: deploy
  image: alpine:3
  variables:
    SSH_USER: "root"
    SSH_HOST: "localhost"
    SSH_COMMAND: "echo 'Hello, world!'"
  before_script:
    - chmod 400 $SSH_KEY
  script:
    - ssh -o StrictHostKeyChecking=no -i $SSH_KEY $SSH_USER@$SSH_HOST $SSH_COMMAND

报错结果

build阶段成功,deploy阶段失败

$ chmod 400 $SSH_KEY
$ ssh -o StrictHostKeyChecking=no -i $SSH_KEY $SSH_USER@$SSH_HOST " if [[ $(docker ps -a |  grep "$CI_REGISTRY_IMAGE"  | awk '{print $1}') != "" ]]; then docker rm -f $(docker ps -a |  grep "$CI_REGISTRY_IMAGE"  | awk '{print $1}');else echo "container not found"; fi && docker image prune -f  -a --filter "until=24h" && docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY && docker pull $CI_REGISTRY_IMAGE:${CI_COMMIT_TAG} && docker run -d --rm -p 80:8080 $CI_REGISTRY_IMAGE:${CI_COMMIT_TAG} "
/bin/sh: eval: line 141: docker: not found
/bin/sh: eval: line 141: docker: not found
Warning: Permanently added '***' (ED***) to the list of known hosts.
Authorized users only. All activities may be monitored and reported.
"docker rm" requires at least 1 argument.
See 'docker rm --help'.
Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]
Remove one or more containers
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

自己排查

在流水线命令中加入了which docker,可以输出正确路径,而且if语句那一行单独在服务器运行也是可以正常输出的,在报错结果来看docker rm过程是可以正确识别到docker的但是docker ps却突然识别不到了就很奇怪

以下是AI自动回复,仅供参考。如对您无帮助,请您继续和论坛其他小伙伴人工交流。

根据报错信息来看,流水线在执行deploy阶段时,无法找到docker命令,导致后续的docker命令执行失败。这可能是因为在deploy阶段的Alpine镜像中没有安装docker。

解决方法:

  1. 在deploy阶段的before_script中添加安装docker的命令,例如:
before_script:
  - apk add docker
  - chmod 400 $SSH_KEY
  1. 如果在deploy阶段中不需要使用docker命令,可以将deploy阶段的镜像改为不包含docker的镜像,例如:
.ssh-command:
  stage: deploy
  image: alpine:3.13
  variables:
    SSH_USER: "root"
    SSH_HOST: "localhost"
    SSH_COMMAND: "echo 'Hello, world!'"
  before_script:
    - chmod 400 $SSH_KEY
  script:
    - ssh -o StrictHostKeyChecking=no -i $SSH_KEY $SSH_USER@$SSH_HOST $SSH_COMMAND

这样就可以避免在deploy阶段中出现docker命令找不到的问题。

以下是AI自动回复,仅供参考。如对您无帮助,请您继续和论坛其他小伙伴人工交流。

根据报错信息,可以看出在SSH远程连接到服务器后,执行的命令中找不到docker命令,因此需要在SSH远程连接之前,先安装docker。

可以在before_script中添加安装docker的命令,例如:

before_script:
  - apk add openssh-client
  - apk add docker
  - chmod 400 $SSH_KEY

这样就可以在SSH远程连接之前,先安装docker,避免找不到docker命令的问题。

感觉是gitlab的变量引用与shell脚本的chong突了,加了引号,双引号什么的也不管用