luca-medeiros / lightning-sam Goto Github PK

View Code? Open in Web Editor NEW

482.0 482.0 53.0 5.76 MB

Fine-tune Segment-Anything Model with Lightning Fabric.

License: Apache License 2.0

Python 100.00%

lightning-sam's Introduction

Hi there 👋

lightning-sam's People

Contributors

Stargazers

Watchers

Forkers

zsef123 anandanne samleoqh limzh00 senlin-ali zhangbo2008 ws1hope albughdadim sunglyoungkim artificial-atomic-intelligence laozhuang727 jurandy-almeida wiekern shivansh-srivastava-techolution everguard-inc vinhtran2611 huayuuu keranli d710055071 hzhang57 thliang01 yanfang-research boyiqian abhishekaich27 jonathanlys01 akring-creator bobogumizelka fyan1024 dokooh daodavid ikiselev7 athakur569 danmarbeck rpa-us jamirando aiyou9 goosey-geo johnnylinyu ngodinhvu123 bukemod cuijianzhu hachiman-zm drawingprocess alkasaliss garfield-hr alibalapour huiyan-dev jcaip note-liu

lightning-sam's Issues

torch.cuda.OutOfMemoryError - Imbalanced allocation of GPU memory

Hello everyone,

I met an issue about imbalanced allication of cuda memory.
As you can see below, there are around 40 GB available. However, it cannot allocate 12 GB for the training.
I have set the environment variable by adding this line to the train.py. But it does not work at all.

os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:15360"

I would like to ask whether you meet such an issue before, if so how did you solve the issue?

I am running the experiments in Google Colab supported by A100 GPU.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 GiB (GPU 0; 39.56 GiB total capacity; 30.24 GiB already allocated; 8.30 GiB free; 30.76 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How should I define a collate_fn function in this Dataset code?

class SamDataset(Dataset):
    def __init__(self, data_path, img_size=256, transform=None, transform_msk=None, mode='train', prompt='click', plane=False):
        
        df = pd.read_csv(os.path.join(data_path, mode + '_GroundTruth.csv'), encoding='gbk')
        self.name_list = df.iloc[:, 1].tolist()
        self.label_list = df.iloc[:, 2].tolist()
        self.data_path = data_path
        self.mode = mode
        self.prompt = prompt
        self.img_size = 256
        
        self.transform = transform
        self.transform_msk = transform_msk
        
    def __len__(self):
        return len(self.name_list)

    def __getitem__(self, idx):
        inout = 1
        point_label = 1
        
        """Get the images"""
        name = self.name_list[idx]
        img_path = os.path.join(self.data_path, name)

        mask_name = self.label_list[idx]
        msk_path = os.path.join(self.data_path, mask_name)

        img = Image.open(img_path).convert('RGB')
        mask = Image.open(msk_path).convert('L')

        newsize = (self.img_size, self.img_size)
        mask = mask.resize(newsize)

        if self.prompt == 'click':
            pt = random_click(np.array(mask) / 255, point_label, inout)
            point_label = np.ones(len(pt), dtype=np.int64)

        if self.transform:
            state = torch.get_rng_state()
            img = self.transform(img)
            torch.set_rng_state(state)
            
            if self.transform_msk:
                mask = self.transform_msk(mask)

        name = name.split('/')[-1].split(".*")[0]
        image_meta_dict = {'filename_or_obj': name}
        return {
            'image': img,
            'label': mask,
            'p_label': point_label,
            'pt': pt,
            'image_meta_dict': image_meta_dict,
        }

def collate_fn(batch):
    return

How should I define the collate_fn function?

How do I adjust my fixed imge_size (1024) to something else, like 256？

point prompt

if this can be converted to point prompt

Focal loss calculation is wrong

Hi, I tried out your project and it solved my problem perfectly.
I noticed that in my dataset, the focal loss is too small to be displayed properly in the terminal.

After some investigation, it appears that there might be an issue with the focal loss calculation.
Here's a snippet of the current implementation:

    def forward(self, inputs, targets, alpha=ALPHA, gamma=GAMMA, smooth=1):
        inputs = F.sigmoid(inputs)
        inputs = torch.clamp(inputs, min=0, max=1)
        #flatten label and prediction tensors
        inputs = inputs.view(-1)
        targets = targets.view(-1)

        BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
        BCE_EXP = torch.exp(-BCE)
        focal_loss = alpha * (1 - BCE_EXP)**gamma * BCE

        return focal_loss

In the current implementation, a mean reduction is applied in the binary cross-entropy (BCE) calculation, resulting in BCE for the entire mask. This leads to focus coefficients being calculated at the instance level. To better focus on difficult pixels and calculate the focal loss correctly, the focal loss calculation should be updated as follows:

    def forward(self, inputs, targets, alpha=ALPHA, gamma=GAMMA, smooth=1):
        inputs = F.sigmoid(inputs)
        inputs = torch.clamp(inputs, min=0, max=1)
        #flatten label and prediction tensors
        inputs = inputs.view(-1)
        targets = targets.view(-1)

        BCE = F.binary_cross_entropy(inputs, targets, reduction='none')
        BCE_EXP = torch.exp(-BCE)
        focal_loss = alpha * (1 - BCE_EXP)**gamma * BCE
        focal_loss = focal_loss.mean()

        return focal_loss

With this revision, the focal loss now focuses on difficult pixels as intended. This approach aligns with the focal loss implementation in the MONAI library.

After applying this correction, the image below demonstrates the losses during the training process on my dataset.

Feel free to incorporate these changes #53 !

Did you use the COCO dataset to fine-tune, did you only segment the COCO class?

Loss cannot converge

As shown above，the fine-tune loss become larger, the config and training details are all follow yours. could you please tell me some possible reasons？

how to fine-tune SamAutomaticMaskGenerator

Hello, author, thank you very much for your work, how to fine-tune SamAutomaticMaskGenerator

Semantic Segmentation

According to your note, does it mean this program only can be use to do object detection? Can I use it to do semantic segmentation in medical image?

A question about mask_threshold

Hi,

Thank you very much for your excellent work. It has helped me a lot, but I have a question.

When we calc the dice loss, we use sigmoid, but i don't see you use sigmoid in the forward pass.

Just use 0.5 as mask_threshold, Is there a problem doing this？

lightning-sam/lightning_sam/train.py

Line 38 in 80ac788

threshold=0.5,

RuntimeError: GET was unable to find an engine to execute this computation

when I started training on single GPU and batch size 2 , I received this error
Global seed set to 1337 loading annotations into memory... Done (t=0.65s) creating index... index created! loading annotations into memory... Done (t=0.24s) creating index... index created! Traceback (most recent call last): File "train.py", line 167, in <module> main(cfg) File "train.py", line 162, in main train_sam(cfg, fabric, model, optimizer, scheduler, train_data, val_data) File "train.py", line 89, in train_sam pred_masks, iou_predictions = model(images, bboxes) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/lightning/fabric/wrappers.py", line 115, in forward output = self._forward_module(*args, **kwargs) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/jupyter/mmdetection/mmdet_sam/playground/mmdetection/demo/sam_tuning/lightning-sam/lightning_sam/model.py", line 38, in forward low_res_masks, iou_predictions = self.model.mask_decoder( File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/segment_anything/modeling/mask_decoder.py", line 94, in forward masks, iou_pred = self.predict_masks( File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/segment_anything/modeling/mask_decoder.py", line 138, in predict_masks upscaled_embedding = self.output_upscaling(src) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/lightning-sam/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 956, in forward return F.conv_transpose2d( RuntimeError: GET was unable to find an engine to execute this computation

CUDA out of memory

Thanks for your excellent work! But I encountered some problems during training. I trained on v100 16GB gpu. After going through some iterations, CUDA out of memory appeared. Where the problem occurred, I posted it below. I I want to know, if there is insufficient GPU memory, training should not be started, instead of stopping after some iterations.

train the original SAM

Hello, may I ask if we can train the original SAM in your code, or the coding only have lightning-sam? Because I haven't found the guide for Sam's original training, please give me some guidance？ Thank you very mach

install error

Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
Traceback (most recent call last):
File "/usr/local/python/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/python/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/python/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-t_cob8lk/overlay/lib/python3.8/site-packages/poetry/core/masonry/api.py", line 41, in prepare_metadata_for_build_wheel
builder = WheelBuilder(poetry)
File "/tmp/pip-build-env-t_cob8lk/overlay/lib/python3.8/site-packages/poetry/core/masonry/builders/wheel.py", line 59, in init
super().init(poetry, executable=executable)
File "/tmp/pip-build-env-t_cob8lk/overlay/lib/python3.8/site-packages/poetry/core/masonry/builders/builder.py", line 83, in init
self._module = Module(
File "/tmp/pip-build-env-t_cob8lk/overlay/lib/python3.8/site-packages/poetry/core/masonry/utils/module.py", line 69, in init
raise ModuleOrPackageNotFound(
poetry.core.masonry.utils.module.ModuleOrPackageNotFound: No file/folder found for package lightning-sam
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Thanks for the repo, I wonder that how can I fix the problem?

Inference

First of all, thanks for providing this code.
After fine-tuning the model and having the checkpoints saved, how can I use that checkpoint to test it on some images? to see if the model is performing well on new images.
Thanks again.

Connection closed by peer

Hi,

Thanks for your nice work so far.

I tried to run your code on a SageMaker studio VM and I run into the following issue:

(lightningsam) root@pytorch-2-0-0-gpu-py31-ml-m5-large-688742f28ae6243c9f9ea2d80bc2:~/lightning-sam# python lightning_sam/train.py
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4
[rank: 3] Global seed set to 1340

distributed_backend=gloo
All distributed processes registered. Starting with 4 processes

[rank: 1] Global seed set to 1338
[rank: 0] Global seed set to 1337
[rank: 2] Global seed set to 1339
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Traceback (most recent call last):
File "/root/lightning-sam/lightning_sam/train.py", line 167, in
main(cfg)
File "/root/lightning-sam/lightning_sam/train.py", line 160, in main
model, optimizer = fabric.setup(model, optimizer)
File "/opt/conda/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 204, in setup
module, optimizers = self._strategy.setup_module_and_optimizers( # type: ignore[assignment]
File "/opt/conda/lib/python3.10/site-packages/lightning/fabric/strategies/strategy.py", line 123, in setup_module_and_optimizers
module = self.setup_module(module)
File "/opt/conda/lib/python3.10/site-packages/lightning/fabric/strategies/ddp.py", line 118, in setup_module
return DistributedDataParallel(module=module, device_ids=self._determine_ddp_device_ids(), **self._ddp_kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 674, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/utils.py", line 118, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [/opt/conda/conda-bld/pytorch_1679586020379/work/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [169.255.255.2]:946

How to train SAM in batches?

I have more than one target objects to segment in the image. Is it possible to feed batches into the model? It is possible to feed the batches into an image encoder. What about a prompt encoder? As I understand, the shape of the prompt batch should be BxNx4 for bounding boxes, where B is the batch size, N is the number of bounding boxes in image. I have checked the source code and prompt encoder expects a shape of Bx4. Am I missing something?

COCO-val or COCO-test set?

hello, I would like to know that your COCO metrics were tested on the COCO-val or on the COCO-test set?

suggestion for adjusting hyper-params in config.py

Hi, could you give some suggestions of adjusting the hyper-params below in config.py.

"learning_rate": 2e-4,
"weight_decay": 3e-2,
"decay_factor": 10,
"steps": [2000, 4000],
"warmup_steps": 500,

Since the size of dataset varies amoung us, the smaller the dataset or the biggerthe batch size, the less the steps, so the default config of steps and warup_steps need to be modified. It would be nice if you could give any suggestion to adjust these params, thanks a lot!

how to train SamAutomaticMaskGenerator

First of all, thanks for providing such an elegant code.
if i need training SamAutomaticMaskGenerator， Do you have any suggestions?

How many VRAM?

How many VRAM do needed to fine-tune SAM ?

Original SAM

Thanks for the exciting work. I have a question about your evaluation of the original SAM, based on what on your code have you calculated the F1 and IoU, or have you tried some prompts and then calculated the average?

Using SamAutomaticMaskGenerator to predict masks by fine-tuned SAM models but returns an empty list.

Hello everyone,

I met a problem when using SamAutomaticMaskGenerator to predict masks by using fine-tuned SAM models.
It always returns an empty list.
But it works well when following the workflow in the utils.py where the SamPredictor(self.model) is used.

I used a customized dataset that includes 512x512x3 image patches and json file of annotations.
The annotations only include two classes that are background and target objects.
I tested the fine-tuned model by the following codes:

from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch

The models are fine-tuned and tested on Google Colab.

If you happened to solve this issue, look forward to your answers.

CUDA out of memory after some iters when training

It's the best SAM fine tune code i've found, thank you so much!

I have a single gpu with 12GB memory, the memory kept growing when training, and ran out of memory after some time. Is it related to lightning?

Multi object question

Thanks for the great work. As understood from other repos, and also your implementation of the loss function in the train.py, it is a focal and dice loss between the predicted and actual masks. So basically it is a binary segmentation. Therefore, the "different" masks for the multiple objects that we see for an image using sam is basically generated sequentially, i.e., by looping over the different objects' (point prompts, bbox prompts etc.) and treating them as single entities and generating a binary mask (foreground vs background) for each of those objects separately, is that right? And then combining them for visualization. Please correct me if I am wrong.

No effect after fine-tuning

Nothing is recognized after fine-tuning

fine tuning sam

Hello team,

great work.

I had one doubt, during fine tuning, What is used as ground truth to calculate the loss against the masks generated by SAM with any prompt?

From the code I see, you use the masks from the dataset. Is it?

Use FocalLoss from fvcore

As mentioned #16 (comment) here, the current focal loss is not consistent with fvcore one.

Re-implementing and running tests would be needed.

F on boundaries

Have you tried to additional benchmark the fine-tuning F score performance only on the boundaries?

See:
Lightning-AI/torchmetrics#1500

error about requested gpu: [0, 1, 2, 3]

I am trying to finetune SAM. but I get this error :
lightning.fabric.utilities.exceptions.MisconfigurationException: You requested gpu: [0, 1, 2, 3]
But your machine only has: [0]
Does anybody know how can I solve it?

Learning Rate is 0 returned by scheduler.get_last_lr()

We are using stepwise LR scheduler, however, the param step in lr_lambda function is 0 at the first time, so it will return 0 as a scalar multiplied by init_lr. If I were correct, whatever init_lr we have, the result seems to be 0 during the whole training process. At least in my own training process, the lr is awalys 0 by invoking API scheduler.get_last_lr().

  def lr_lambda(step):
      if step < cfg.opt.warmup_steps:
          return step / cfg.opt.warmup_steps
      elif step < cfg.opt.steps[0]:
          return 1.0
      elif step < cfg.opt.steps[1]:
          return 1 / cfg.opt.decay_factor
      else:
          return 1 / (cfg.opt.decay_factor**2)

The learning rate I extract from optimizer.param_groups varies stepwise and greater than 0. Is there any different in ways getting current learning rate between
scheduler.get_last_lr() and optimizer.param_groups[0]['lr']? I read an issue about this question, which tells me these are same up torch1.3

Is it possible to train the model on my dataset and can I use it without prompts.

loss parameter 'num_masks'

In the file lightning_sam/train.py, why need to give the parameter 'num_masks' ? does it means ‘alpha’ of the FocalLoss and 'smooth' of the Diceloss are set to 'num_masks' ?

                loss_focal += focal_loss(pred_mask, gt_mask, num_masks)
                loss_dice += dice_loss(pred_mask, gt_mask, num_masks)

CUDA out of memory error

What would be the recommended setting for fine-tuning using trash dataset? I use 4 3090Ti but getting error of out of memory while initiating.

Don't use bbox prompt

What should I do if I don't want to use bbox prompts either

what are you finetuning?

It isn't clear in the readme what's the finetuning you do. Is it SAM as an interactive image segmentation network what you're finetuning? This would mean you want good segmentations given some inputs. Or is it using the SAM encoder as backbone of a traditionally supervised segmentation network? You say you train only using bounding boxes, how do you generate the bounding boxes? I still don't quite get the idea, it'd be great if you could clarify. Thanks for the good work!

coco format

Which coco format is used to train the model?
polygon or column-major RLE

High number of instances per image

First of all, thanks for providing such an elegant code.
I'm facing a problem of running out of memory when fine-tuning the model to a specific dataset containing many instances per image. The number of instances per image also may vary a lot from each image.
I followed the discussion on #4 but did not get good results. To be honest, I did not get where I should use the loss.item().cpu(), is it on each loss?
Do you have any tips on dealing with this problem?
About the setting, I'm using an A100-80GB, batch_size = 1.
I've faced this problem with this particular dataset several times and one solution I've found is to use smaller images, but I don't know how this would work here since it's fine-tuning and not training from scratch.
Other than that, how one could use the standard COCO evaluation within this program, to get all the other metrics like APm, etc.

Thank you

can I get code to train via coordinate prompts

Are you planning to release the test code as well?

I used train.py for instance segmentation by changing to instance. It trained for batch size 1 without errors. But when I loaded the trained checkpoints (after 16th epoch), using the following command, it did not work. ### (It resulted in empty masks i.e. [] ).
Am I doing anything wrong in this? Please correct me.

image = cv2.imread('/workspace/lightning-sam/dataset/instance_version/val/vid_000002_frame0000018.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
sam_checkpoint = "/workspace/lightning-sam/src/out/training/epoch-000016-f10.81-ckpt.pth"
model_type = "vit_h"
device = "cuda"
sam = sam_model_registrymodel_type
sam.to(device=device)

mask_generator = SamAutomaticMaskGenerator(sam)

masks = mask_generator.generate(image)

12Gb GPU

Hi all,
A question I've a rtx 2080 ti with 12 Gb where only one image (embedding) fits.
So, as a start I could use a batch size of one but this does not train well.

Would it be an option to run several images sequentially before doing the backward step?

something like:

for epoch
   loss = tensor
  for image in ["N images"]
      loss+ = predict img
      loss += predict img
      loss += predict img
      loss += predict img
  backward(loss)

Regards,
Nacho

Training Time

Thanks for creating this repo. Great work!
May I ask how many GPUs you use on the training with COCO dataset and what is the training time？how long it will take for the model to converge？
Look forward to your reply！

Saving of checkpoints

Is there any way to control how many times the checkpoint is saved, can we modify the code to just save best and last checkpoint?

Running on local device

What changes I have to make to run the code on my local GPU or on colab?

How to import the trained model in another file

Hello,

Thank you very much for the good tutorial.

How can I export the trained model and apply it in another file to an input image with bounding box?

Many thanks in advance!

labelme

How to use labelme annotated datasets for fine-tuning? Or, convert the labelme format to COCO format? Can you provide the source code? thanks

Put `forward_pass` function in a Module

First of all, thanks for creating this repo. Great work!

Since you adopted Lightning Fabric in your train.py script, it is very easy to launch it on multi-GPU with DDP or other strategies, love it! There is one caveat with DDP we should be aware of. It requires that all inputs are passed through the model's forward method. Currently, this gets bypassed because the script calls model.image_encoder and model.image_decoder separately:

lightning-sam/src/train.py

Line 127 in f2bb731

image_embeddings = model.image_encoder(images)

The code will work but make DDP ineffective. Luckily, this can be easily fixed by moving the code in forward_pass here

lightning-sam/src/train.py

Lines 125 to 155 in f2bb731

    
           def forward_pass(model, images, bboxes): 
        
               _, _, H, W = images.shape 
        
               image_embeddings = model.image_encoder(images) 
        
               pred_masks = [] 
        
               ious = [] 
        
               for embedding, bbox in zip(image_embeddings, bboxes): 
        
                   sparse_embeddings, dense_embeddings = model.prompt_encoder( 
        
                       points=None, 
        
                       boxes=bbox, 
        
                       masks=None, 
        
                   ) 
        
                   low_res_masks, iou_predictions = model.mask_decoder( 
        
                       image_embeddings=embedding.unsqueeze(0), 
        
                       image_pe=model.prompt_encoder.get_dense_pe(), 
        
                       sparse_prompt_embeddings=sparse_embeddings, 
        
                       dense_prompt_embeddings=dense_embeddings, 
        
                       multimask_output=False, 
        
                   ) 
        
                   masks = F.interpolate( 
        
                       low_res_masks, 
        
                       (H, W), 
        
                       mode="bilinear", 
        
                       align_corners=False, 
        
                   ) 
        
                   pred_masks.append(masks.squeeze(1)) 
        
                   ious.append(iou_predictions) 
        
               # binary_mask = normalize(threshold(upscaled_masks, 0.0, 0)) 
        
               return pred_masks, ious

into a nn.Module.

class Model(nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

	def forward(self, images, bboxes):
	    _, _, H, W = images.shape
	    image_embeddings = self.model.image_encoder(images)
	    pred_masks = []
	    ious = []
	    for embedding, bbox in zip(image_embeddings, bboxes):
	        sparse_embeddings, dense_embeddings = model.prompt_encoder(
	            points=None,
	            boxes=bbox,
	            masks=None,
	        )
	
	        low_res_masks, iou_predictions = self.model.mask_decoder(
	            image_embeddings=embedding.unsqueeze(0),
	            image_pe=self.model.prompt_encoder.get_dense_pe(),
	            sparse_prompt_embeddings=sparse_embeddings,
	            dense_prompt_embeddings=dense_embeddings,
	            multimask_output=False,
	        )
	
	        masks = F.interpolate(
	            low_res_masks,
	            (H, W),
	            mode="bilinear",
	            align_corners=False,
	        )
	        pred_masks.append(masks.squeeze(1))
	        ious.append(iou_predictions)
	    # binary_mask = normalize(threshold(upscaled_masks, 0.0, 0))
	    return pred_masks, ious

Then, we would set up this model in Fabric:

model = Model(...)
model, optimizer = fabric.setup(model, optimizer)

And in the loop we would call model(images, bboxes) instead of forward_pass().
With these changes, the train script can make use of DDP training effectively. I hope this will be useful.

Tensor size error

I reported this error when running train.py,

RuntimeError: The size of tensor a (1024) must match the size of tensor b (1280) at non-singleton dimension 2, The size of tensor A (1024) must match the size of tensor B (1280) at non-Singleton Dimension 2.

And the shape of pred_mask is torch.Size([3, 1024, 1024]) and the shape of gt_mask is torch.Size([3, 768, 1280]).
How do I adjust the code so that the tensor dimensions match?

	def forward_pass(model, images, bboxes):
	_, _, H, W = images.shape
	image_embeddings = model.image_encoder(images)
	pred_masks = []
	ious = []
	for embedding, bbox in zip(image_embeddings, bboxes):
	sparse_embeddings, dense_embeddings = model.prompt_encoder(
	points=None,
	boxes=bbox,
	masks=None,
	)

	low_res_masks, iou_predictions = model.mask_decoder(
	image_embeddings=embedding.unsqueeze(0),
	image_pe=model.prompt_encoder.get_dense_pe(),
	sparse_prompt_embeddings=sparse_embeddings,
	dense_prompt_embeddings=dense_embeddings,
	multimask_output=False,
	)

	masks = F.interpolate(
	low_res_masks,
	(H, W),
	mode="bilinear",
	align_corners=False,
	)
	pred_masks.append(masks.squeeze(1))
	ious.append(iou_predictions)
	# binary_mask = normalize(threshold(upscaled_masks, 0.0, 0))

	return pred_masks, ious

luca-medeiros / lightning-sam Goto Github PK

lightning-sam's Introduction

Hi there 👋

lightning-sam's People

Contributors

Stargazers

Watchers

Forkers

lightning-sam's Issues

distributed_backend=gloo All distributed processes registered. Starting with 4 processes

Recommend Projects

Recommend Topics

Recommend Org

distributed_backend=gloo
All distributed processes registered. Starting with 4 processes