You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a Nextflow implementation of MassiveFold (a derivative of AlphaFold), which requires aligning sequences against a large database (~1.2 TB) stored in a directory. I'd like to use Kubernetes (K8s) as the executor, but I'm facing an issue: since I obviously can't copy 1 TB of data into a pod every time the pipeline runs, how can I allow access to this data efficiently?
I wasn't able to use a Persistent Volume Claim (PVC) through Wave and Fusion, but maybe there's a workaround or better approach? Through a s3 bucket ?
My process:
process Alignement_with_colabfold {
tag "$seqFile.baseName"
publishDir "result/${seqFile.baseName}/alignment"
label 'colabfold'
input:
path(seqFile)
path(data_dir)
val(pair_strategy)
output:
tuple val(seqFile.baseName), path("${seqFile.baseName}_msa*")
script:
"""
echo "=== Starting ColabFold alignment ==="
echo "Sequence file: $seqFile"
echo "Data directory: $data_dir"
echo "Pair strategy: ${pair_strategy}"
ls -a > ls.txt
# Check if data directory exists
if [ ! -d "$data_dir" ]; then
echo "ERROR: Database directory does not exist: $data_dir"
exit 1
fi
# Set pairing strategy
if [[ ${pair_strategy} == "greedy" ]]; then
pairing_strategy=0
elif [[ ${pair_strategy} == "complete" ]]; then
pairing_strategy=1
else
echo "ValueError: --pair_strategy '${pair_strategy}' is not valid. Use 'greedy' or 'complete'"
exit 1
fi
echo "Using pairing strategy: \$pairing_strategy"
# Run ColabFold search
colabfold_search $seqFile $data_dir ${seqFile.baseName}_msa --pairing_strategy \${pairing_strategy}
echo "=== ColabFold alignment completed ==="
"""
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I'm working on a Nextflow implementation of MassiveFold (a derivative of AlphaFold), which requires aligning sequences against a large database (~1.2 TB) stored in a directory. I'd like to use Kubernetes (K8s) as the executor, but I'm facing an issue: since I obviously can't copy 1 TB of data into a pod every time the pipeline runs, how can I allow access to this data efficiently?
I wasn't able to use a Persistent Volume Claim (PVC) through Wave and Fusion, but maybe there's a workaround or better approach? Through a s3 bucket ?
My process:
and my config for this profile
Thanks in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions