edge computing AI server
AI Development Frameworks and Tools
NVIDIA CUDA: The foundation for AI computing, allowing developers to offload computation tasks to hundreds of GPU cores for parallel processing, providing a rich set of APIs and development tools that broadly support applications in deep learning and scientific computing.
TensorFlow: A deep learning framework primarily based on static computation graphs, known for speed and computational efficiency, suitable for production environments and large-scale computing. It has a wealth of third-party tools and learning resources, supporting multiple languages such as Python, C++, and JavaScript.
PyTorch: A deep learning framework primarily based on dynamic computation graphs, known for its flexibility, allowing for real-time adjustments to model structures, making it suitable for academia and research. It has many specialized libraries for computer vision and natural language processing (such as torchvision and torchtext), making it particularly well-suited for image and text processing.
NVIDIA TensorRT: An inference optimization platform used to convert trained deep learning models into efficient formats suitable for inference, supporting multiple frameworks including TensorFlow and PyTorch.
JupyterHub Jupyter: Provides a Jupyter Notebook environment that allows for collaboration among multiple users and shared computing resources.
NVIDIA NGC: The NVIDIA NGC platform provides pre-trained AI models and containers that are optimized to fully utilize the hardware capabilities of NVIDIA GPUs.
Cloud Management and Resource Allocation Tools
Prometheus + Grafana: Prometheus is used for real-time monitoring of server resources and GPU usage, while Grafana can generate visual reports.
Slurm: A job scheduling system used to manage resources such as CPU, RAM, and disk space, allocating these resources based on user needs.
Device Specifications
CPU: 64 Core
RAM: 512GB DDR4
HD: 32TB
GPU:
•Model: NVIDIA H100 80GB
• CUDA Cores: 14,576
•Tensor Cores: 432
Graphics Card: RTX 4060
Operating System: RedHat OpenShift Architecture