Surgical Video Understanding and Multimodal Foundation Models Research Assistant/Intern
手术视频理解多模态大模型研究助理/实习生
We are looking for a highly motivated Research Assistant to join our team and contribute to cutting-edge research on foundational models for surgical video understanding. This project focuses on developing advanced multimodal models to support tasks such as surgical action understanding, workflow recognition, quality assessment, and decision-making in surgical scenarios.
Responsibilities:
1. Assist in the research and development of foundational models for surgical video understanding, focusing on tasks such as surgical behavior analysis, phase recognition, quality evaluation, and decision-making.
2. Contribute to the construction of large-scale vision-language pretraining datasets for surgical contexts.
3. Explore multimodal pretraining techniques to enable large models to learn contextual knowledge in surgical environments.
4. Leverage surgical context knowledge to improve the generalization capabilities of multimodal large models across different procedures and scenarios.
5. Assist in writing technical documentation, drafting patents, and contributing to high-quality research papers.
Requirements :
1. Familiarity with foundational model pretraining algorithms, such as MAE, self-supervised learning, and vision-language alignment.
2. Experience with multimodal large models (e.g., LLaVA, BLIP, Qwen-VL) and their training or applications is preferred.
3. Knowledge of large-model-related applications, such as RAG, LangChain, or LlamaIndex, is a plus.
4. Prior experience in publishing research papers or contributing to academic projects is advantageous.
Preferred Skills:
1. Background in computer vision, natural language processing, or related fields.
2. Proficiency in programming languages such as Python, with experience in deep learning frameworks (e.g., PyTorch, TensorFlow).
3. Strong analytical and problem-solving skills, with a willingness to learn and explore new technologies.
4. Ability to work collaboratively in a research environment and contribute effectively to team goals.
职位描述:
我们正在招聘研究助理和实习生,加入我们的团队共同开展手术视频理解领域的前沿研究。该项目旨在开发先进的多模态大模型,支持手术行为理解、手术流程识别、手术质量评估以及手术决策等任务。
岗位职责:
1. 协助开展面向手术视频理解的基础模型研究,支撑手术行为分析、手术阶段识别、质量评估和手术决策等医学任务。
2. 参与构建面向手术场景理解的大规模“视觉-语言“预训练数据集。
3. 探索多模态预训练方法,使大模型能够利用手术场景中的上下文知识进行精准决策。
4. 利用手术上下文知识,提升多模态大模型在不同手术类型和场景下的泛化能力。
5. 协助撰写技术文档、申请专利,并参与高水平学术论文的撰写和发表。
任职要求:
1. 了解基础模型预训练算法,如 MAE、自监督学习、视觉-语言对齐等。
2. 熟悉多模态大模型(如 LLaVA、BLIP、Qwen-VL)的训练和应用者优先。
3. 熟悉大模型相关应用(如 RAG、LangChain、LlamaIndex)者优先。
4. 具有高水平学术论文发表经历者优先。
优先条件:
1. 具备计算机视觉、自然语言处理或相关领域的学术背景。
2. 熟练掌握 Python 编程语言,并熟悉深度学习框架(如 PyTorch、TensorFlow)。
3. 拥有较强的分析与解决问题能力,乐于学习并探索新技术。
4. 具备优秀的团队协作和沟通能力,能在研究环境中高效工作。
All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.
Salary | N/A (Search your salary info in ![]() |
Job Function | |
Location |
|
Work Model |
|
Industry | |
Employment Term |
|
Experience |
|
Education |
|
Download the CTgoodjobs app