基于PaddleClas的脑肿瘤放射基因组分类

一、项目介绍

AI达人特训营第二期

大脑中的恶性肿瘤是一种危及生命的疾病。它被称为胶质母细胞瘤,既是成人最常见的脑癌形式,也是预后最差的一种,中位生存期不到一年。肿瘤中存在称为MGMT启动子甲基化的特定基因序列已被证明是有利的预后因素,也是对化疗反应的强预测因子。

目前,癌症的遗传分析需要手术提取组织样本。然后可能需要几周的时间来确定肿瘤的遗传特征。根据结果和所选初始治疗的类型,可能需要进行后续手术。如果能够开发出一种仅通过成像(即放射基因组学)预测癌症遗传学的准确方法,这可能会最大限度地减少手术次数并改进所需的治疗类型。北美放射学会(RSNA)与医学图像计算和计算机辅助干预学会(MICCAI 学会)合作,以改善胶质母细胞瘤患者的诊断和治疗计划。

二、数据集

2.1数据集介绍

共7022张,分为训练集和测试集。分为四类胶质瘤-脑膜瘤-无肿瘤和垂体

2.2解压数据集

!unzip /home/aistudio/data/data180229/archive.zip #数据集
!unzip /home/aistudio/data/data90342/PaddleClas-release-2.1.zip #PaddleClas

三、数据处理

  • 根据官方paddleclas的提示,我们需要把训练集图像变为两个txt文件
  • 按照经典的划分方式0.8:0.2
  • train_list.txt
  • val_list.txt
#导入相关包
from sklearn.utils import shuffle
import os
import pandas as pd
import numpy as np
from PIL import Image
import paddle
import paddle.nn as nn
from paddle.io import Dataset
import paddle.vision.transforms as T
import paddle.nn.functional as F
from paddle.metric import Accuracy
import random
#获得原始训练集
dirpath = "Training"
# 先得到总的txt后续再进行划分,因为要划分出验证集,所以要先打乱,因为原本是有序的
def get_all_txt():
    all_list = []
    i = 0
    for root,dirs,files in os.walk(dirpath): # 分别代表根目录、文件夹、文件
        for file in files:
            i = i + 1 
            if("glioma" in root):
                all_list.append(os.path.join(root,file)+" 0\n")
            if("meningioma" in root):
                all_list.append(os.path.join(root,file)+" 1\n")
            if("notumor" in root):
                all_list.append(os.path.join(root,file)+" 2\n")
            if("pituitary" in root):
                all_list.append(os.path.join(root,file)+" 3\n")
    allstr = ''.join(all_list)
    f = open('all_list.txt','w',encoding='utf-8')
    f.write(allstr)
    return all_list , i

all_list,all_lenth = get_all_txt()
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

/tmp/ipykernel_199/109634123.py in <module>
     21     return all_list , i
     22 
---> 23 all_list,all_lenth = get_all_txt()


/tmp/ipykernel_199/109634123.py in get_all_txt()
      5     all_list = []
      6     i = 0
----> 7     for root,dirs,files in os.walk(dirpath): # 分别代表根目录、文件夹、文件
      8         for file in files:
      9             i = i + 1


NameError: name 'os' is not defined
#打乱原始训练集
random.shuffle(all_list)
random.shuffle(all_list)
#划分训练集和验证集
train_size = int(all_lenth * 0.8)
train_list = all_list[:train_size]
val_list = all_list[train_size:]

print(len(train_list))
print(len(val_list))
4569
1143
# 运行cell,生成txt 
train_txt = ''.join(train_list)
f_train = open('train_list.txt','w',encoding='utf-8')
f_train.write(train_txt)
f_train.close()
print("train_list.txt 生成成功!")
train_list.txt 生成成功!
# 运行cell,生成txt
val_txt = ''.join(val_list)
f_val = open('val_list.txt','w',encoding='utf-8')
f_val.write(val_txt)
f_val.close()
print("val_list.txt 生成成功!")
val_list.txt 生成成功!
#生成测试集数据列表
test_dirpath = "Testing"
def get_test_txt():
    test_list=[]
    i = 0
    for root,dirs,files in os.walk(test_dirpath):
        for file in files:
            i = i+1
            if("glioma" in root ):
                test_list.append(os.path.join(root,file)+" 0\n")
            if("meningioma" in root):
                test_list.append(os.path.join(root,file)+" 1\n")
            if("notumor" in root ):
                test_list.append(os.path.join(root,file)+" 2\n")
            if("pituitary" in root ):
                test_list.append(os.path.join(root,file)+" 3\n")
    test_str = ''.join(test_list)
    f = open('test_list.txt', 'w', encoding='utf-8')
    f.write(test_str)
    return test_list,i
test_list,test_lenth = get_test_txt()

四、训练

4.1 移动相关文件

将图片移动到paddleclas下面的数据集里面
至于为什么现在移动,也是我的一点小技巧,防止之前移动的话,生成的txt的路径是全路径,反而需要去掉路径的一部分。

!mv Training/ PaddleClas-release-2.1/dataset/
!mv all_list.txt PaddleClas-release-2.1/dataset/
!mv train_list.txt PaddleClas-release-2.1/dataset/
!mv val_list.txt PaddleClas-release-2.1/dataset/
!mv test_list.txt PaddleClas-release-2.1/dataset/
!mv Testing/ PaddleClas-release-2.1/dataset/
%cd PaddleClas-release-2.1
!ls
/home/aistudio/PaddleClas-release-2.1
configs  docs	      MANIFEST.in    README_cn.md      setup.py
dataset  __init__.py  paddleclas.py  README.md	       tools
deploy	 LICENSE      ppcls	     requirements.txt

4.2 配置相关参数

/home/aistudio/PaddleClas-release-2.1/configs/ResNet/ResNet50.yaml
mode: ‘train’
ARCHITECTURE:
name: ‘ResNet50’

pretrained_model: ""
model_save_dir: "./output/"
classes_num: 4
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 4
image_shape: [3, 512, 512]

use_mix: False
ls_epsilon: -1

LEARNING_RATE:
    function: 'Piecewise'          
    params:                   
        lr: 0.1               
        decay_epochs: [30, 60, 90] 
        gamma: 0.1 

OPTIMIZER:
    function: 'Momentum'
    params:
        momentum: 0.9
    regularizer:
        function: 'L2'
        factor: 0.000100

TRAIN:
    batch_size: 64
    num_workers: 0
    file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/train_list.txt"
    data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/"
    shuffle_seed: 0
    transforms:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - RandCropImage:
            size: 224
        - RandFlipImage:
            flip_code: 1
        - NormalizeImage:
            scale: 1./255.
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - ToCHWImage:

VALID:
    batch_size: 64
    num_workers: 4
    file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/val_list.txt"
    data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/"
    shuffle_seed: 0
    transforms:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - ResizeImage:
            resize_short: 256
        - CropImage:
            size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - ToCHWImage:

4.4开始训练

#开始训练
!python tools/train.py \
    -c /home/aistudio/PaddleClas-release-2.1/configs/ResNet/ResNet50.yaml
2022-11-29 14:46:35 INFO: 
===========================================================
==        PaddleClas is powered by PaddlePaddle !        ==
===========================================================
==                                                       ==
==   For more info please go to the following website.   ==
==                                                       ==
==       https://github.com/PaddlePaddle/PaddleClas      ==
===========================================================

2022-11-29 14:46:35 INFO: ARCHITECTURE : 
2022-11-29 14:46:35 INFO:     name : ResNet50
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: LEARNING_RATE : 
2022-11-29 14:46:35 INFO:     function : Piecewise
2022-11-29 14:46:35 INFO:     params : 
2022-11-29 14:46:35 INFO:         decay_epochs : [30, 60, 90]
2022-11-29 14:46:35 INFO:         gamma : 0.1
2022-11-29 14:46:35 INFO:         lr : 0.1
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: OPTIMIZER : 
2022-11-29 14:46:35 INFO:     function : Momentum
2022-11-29 14:46:35 INFO:     params : 
2022-11-29 14:46:35 INFO:         momentum : 0.9
2022-11-29 14:46:35 INFO:     regularizer : 
2022-11-29 14:46:35 INFO:         factor : 0.0001
2022-11-29 14:46:35 INFO:         function : L2
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: TRAIN : 
2022-11-29 14:46:35 INFO:     batch_size : 64
2022-11-29 14:46:35 INFO:     data_dir : /home/aistudio/PaddleClas-release-2.1/dataset/
2022-11-29 14:46:35 INFO:     file_list : /home/aistudio/PaddleClas-release-2.1/dataset/train_list.txt
2022-11-29 14:46:35 INFO:     num_workers : 0
2022-11-29 14:46:35 INFO:     shuffle_seed : 0
2022-11-29 14:46:35 INFO:     transforms : 
2022-11-29 14:46:35 INFO:         DecodeImage : 
2022-11-29 14:46:35 INFO:             channel_first : False
2022-11-29 14:46:35 INFO:             to_rgb : True
2022-11-29 14:46:35 INFO:         RandCropImage : 
2022-11-29 14:46:35 INFO:             size : 224
2022-11-29 14:46:35 INFO:         RandFlipImage : 
2022-11-29 14:46:35 INFO:             flip_code : 1
2022-11-29 14:46:35 INFO:         NormalizeImage : 
2022-11-29 14:46:35 INFO:             mean : [0.485, 0.456, 0.406]
2022-11-29 14:46:35 INFO:             order : 
2022-11-29 14:46:35 INFO:             scale : 1./255.
2022-11-29 14:46:35 INFO:             std : [0.229, 0.224, 0.225]
2022-11-29 14:46:35 INFO:         ToCHWImage : None
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: VALID : 
2022-11-29 14:46:35 INFO:     batch_size : 64
2022-11-29 14:46:35 INFO:     data_dir : /home/aistudio/PaddleClas-release-2.1/dataset/
2022-11-29 14:46:35 INFO:     file_list : /home/aistudio/PaddleClas-release-2.1/dataset/val_list.txt
2022-11-29 14:46:35 INFO:     num_workers : 4
2022-11-29 14:46:35 INFO:     shuffle_seed : 0
2022-11-29 14:46:35 INFO:     transforms : 
2022-11-29 14:46:35 INFO:         DecodeImage : 
2022-11-29 14:46:35 INFO:             channel_first : False
2022-11-29 14:46:35 INFO:             to_rgb : True
2022-11-29 14:46:35 INFO:         ResizeImage : 
2022-11-29 14:46:35 INFO:             resize_short : 256
2022-11-29 14:46:35 INFO:         CropImage : 
2022-11-29 14:46:35 INFO:             size : 224
2022-11-29 14:46:35 INFO:         NormalizeImage : 
2022-11-29 14:46:35 INFO:             mean : [0.485, 0.456, 0.406]
2022-11-29 14:46:35 INFO:             order : 
2022-11-29 14:46:35 INFO:             scale : 1.0/255.0
2022-11-29 14:46:35 INFO:             std : [0.229, 0.224, 0.225]
2022-11-29 14:46:35 INFO:         ToCHWImage : None
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: classes_num : 4
2022-11-29 14:46:35 INFO: epochs : 120
2022-11-29 14:46:35 INFO: image_shape : [3, 512, 512]
2022-11-29 14:46:35 INFO: ls_epsilon : -1
2022-11-29 14:46:35 INFO: mode : train
2022-11-29 14:46:35 INFO: model_save_dir : ./output/
2022-11-29 14:46:35 INFO: pretrained_model : 
2022-11-29 14:46:35 INFO: save_interval : 1
2022-11-29 14:46:35 INFO: topk : 4
2022-11-29 14:46:35 INFO: total_images : 1281167
2022-11-29 14:46:35 INFO: use_mix : False
2022-11-29 14:46:35 INFO: valid_interval : 1
2022-11-29 14:46:35 INFO: validate : True
W1129 14:46:35.077515  1228 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1129 14:46:35.082067  1228 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2022-11-29 14:46:39 INFO: epoch:0  , train step:0   , top1: 0.28125, top4: 1.00000, loss: 1.58380, lr: 0.100000, batch_cost: 2.82949 s, reader_cost: 0.54911 s, ips: 22.61888 images/sec, eta: 6:41:47
2022-11-29 14:46:42 INFO: epoch:0  , train step:10  , top1: 0.46875, top4: 1.00000, loss: 1.15485, lr: 0.100000, batch_cost: 0.41718 s, reader_cost: 0.23281 s, ips: 153.41036 images/sec, eta: 0:59:10
^C
# #配置评估文件,修改相关参数
# mode: 'valid'
# ARCHITECTURE:
#     name: "ResNet50"

# pretrained_model: "/home/aistudio/PaddleClas-release-2.1/output/ResNet50/best_model/ppcls"
# classes_num: 4
# total_images: 1311
# topk: 4
# image_shape: [3, 512, 512]

# VALID:
#     batch_size: 16
#     num_workers: 0
#     file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/test_list.txt"
#     data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/"
#     shuffle_seed: 0
#     transforms:
#         - DecodeImage:
#             to_rgb: True
#             channel_first: False
#         - ResizeImage:
#             resize_short: 256
#         - CropImage:
#             size: 224
#         - NormalizeImage:
#             scale: 1.0/255.0
#             mean: [0.485, 0.456, 0.406]
#             std: [0.229, 0.224, 0.225]
#             order: ''
#         - ToCHWImage:


#开始评估
!python tools/eval.py \
    -c /home/aistudio/PaddleClas-release-2.1/configs/eval.yaml

五、预测

#预测ct
!python tools/infer/infer.py \
    -i /home/aistudio/PaddleClas-release-2.1/dataset/Testing/glioma \
    --model ResNet50 \
    --pretrained_model "output/ResNet50/best_model/ppcls" \
    --load_static_weights False \
    --class_num=4

相关信息

导师:林旭
学员:刘扬

请点击此处查看本环境基本用法.

Please click here for more detailed instructions.

此文章为搬运
原项目链接

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐