No module name sklearn
No module name gensim
Я попытался решить эту проблему в 4 шага:
- Шаг 1 (подготовка файла require.txt):< /li>
pandas==1.5.3
scikit-learn==1.2.2
pyarrow==11.0.0
gensim==4.3.1
- Шаг 2 (установка и упаковка) в терминале sagemaker внутри блокнота sagemaker:
pip install -r requirements.txt -t ./packages"
zip -r dependencies.zip ./packages
aws s3 cp dependencies.zip s3://data-science/code/dependencies/
- Шаг 3 (обновление кода процессора):
spark_processor = PySparkProcessor(
base_job_name="sm-spark",
framework_version="3.1",
role=role,
instance_count=default_instance_count, # Adjust the instance count as needed
instance_type=default_instance, # Adjust the instance type as needed
max_runtime_in_seconds=1200)
# Setting input bucket:
input_bucket = 'data-science”
# Define the number of records wanted:
number = "100" # Change this as needed
# Run the Spark job:
spark_processor.run(
submit_app="process.py",
arguments=[input_bucket, number],
submit_py_files=["s3://data-science/code/dependencies/dependencies.zip"],
spark_event_logs_s3_uri="s3://data-science/spark_event_logs",
logs=False,)
- Шаг 4 (наш файлprocess.py):
spark = SparkSession.builder \
.appName("Spark Processing Job") \
.getOrCreate()
print("Spark session initialized with optimized configuration.")
# setting the spark context to pick the py file for dependencies
sc = SparkContext.getOrCreate()
sc.addPyFile(local_dependencies_path)
print("Py file added")
# Import all python dependencies:
try:
import pandas as pd
print(f"Pandas version: {pd.__version__}")
import numpy as np
print(f"Numpy version: {np.__version__}")
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import euclidean_distances
print("Sklearn metrics loaded")
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
print("Gensim models loaded")
except ImportError as e:
# Log the error and terminate the job
print(f"Dependency loading error: {e}")
raise SystemExit(f"Job terminated due to missing dependencies: {e}")
Подробнее здесь: https://stackoverflow.com/questions/793 ... -in-aws-sa
Мобильная версия