csailvision / gazecapture Goto Github PK
View Code? Open in Web Editor NEWEye Tracking for Everyone
Home Page: http://gazecapture.csail.mit.edu
License: Other
Eye Tracking for Everyone
Home Page: http://gazecapture.csail.mit.edu
License: Other
Can anyone access the website?
http://gazecapture.csail.mit.edu/ seems to be down once again, similar to #7
Hi,
I have some questions. Is the checkpoint.pth.tar the pre-trained model used in the publish work?
It's shown that caffe pre-trained model is provided, is it possible that you can provide us the pytorch version? Is there a way to convert the caffe model to pytorch model? (we can't find a reliable tool)
Thanks >v<
Good work. But I‘m sorry that I couldn't download GazeCapture dataset from the https://gazecapture.csail.mit.edu/download.php. Account is always unable to log in to get the correct download link due to an unknown error. I have tried many times to register and login but failed. So could you provide another available download solution. Thank you. @quantombone @adikhosla @andrewowens @visionATcsail
Hey,
May i please know what are the FrameW and FrameH arguments, are they the original frame width and height(480640) or is it the Resized value(224224).
How The faceGrid width and height values are same in json for ex. 13*13??
Thanks,
Madan
Hi,
As far as I know the iOS face/eye detection service does not provide bounding boxes for eyes, only eye and eyebrow control points (landmarks). What was the procedure/algorithm that you used to generate bounding boxes for eyes from the provided information?
Thanks, Botond
The tarfile seems to be corrupted: pytorch/checkpoint.pth.tar
sorry to bother you,could you please describe more details to get labels, Thank you so much!!
Hi,I would like to know if you have any human eye and human eye image preprocessing, thank you for your answer!
Hello, thank you for your public source code and dataset. I want to use the model in android phone to control an application by eye movement. Thus, I need to know how did the face and eye parts are cropped in the dataset so that I can feed new samples cropped in android to the model.
In prepareDataset.py, I encounter the following code section -
GazeCapture/pytorch/prepareDataset.py
Line 103 in e09c285
I understand that after reading the values from appleFace.json as int
, the X & Y pixel coordinates are treated as 1-indexed values (which is Matlab compatible). So, for converting it to 0-indexed in Python, we should add [-1,-1,0,0] to [X,Y,W,H]. But in the code, [-1,-1,1,1] is added (which will increase the width & height of face crops by 1 pixel).
Can you please clarify the reason why 1 is added to W & H? I know that increment of 1 pixel wouldn't matter much, but I'd like to get clarified.
Also, for leftEyeBbox & rightEyeBbox, [-1,-1,0,0] should be added instead of [0,-1,0,0], according to me.
Thanks.
I am not to open the url http://gazecapture.csail.mit.edu/download.php
to download the dataset.
I try to load the GazeCapture dataset and use the checkpoint file to test, which sad it can reach L2 error of 2.46cm. But the checkpoint file totally not work and reach the L2 error of about 25cm, I don't know where my problem is, can anybody help? Thanks!!!
This work is great and interesting!
I have run the pytorch code in Linux but I didn't find any visualization. Is there any code or APP in Windows/Linux/IOS/Android so that I can have an intuitive experience.
Thanks!
I failed to visit http://gazecapture.csail.mit.edu (connection timed out) when trying to download the dataset. Is it down or under maintainance?
hello, dear friend!
Now I want to draw 3D gaze direction line in the 2D image by the prediction coordinate,
would you please give me any ideas?
Thanks a lot!
Best Regards
Navigate to https://gazecapture.csail.mit.edu/explore.php. On that page, there is only a black image. There are some errors in the console, and debugging the JavaScript at a high level (since the code is minified) shows that it calling code in https://gazecapture.csail.mit.edu/js/load-image.all.min.js.
HTML1521: Unexpected "" or end of file. All open elements should be closed before the end of the document.
I am currently going through the pytorch eye model code and stumbled across an inconsistency that I suspect is being left out in the article on purpose, which just needs confirming. The described model does not mention max pooling but is being used between each layer in the code.
The described model is as follows:
The output is the distance, in centimeters, from the camera. CONV rep-
resents convolutional layers (with filter size/number of kernels: CONV-E1,CONV-F1: 11 × 11/96, CONV-E2,CONV-F2:
5 × 5/256, CONV-E3,CONV-F3: 3 × 3/384, CONV-E4,CONV-F4: 1 × 1/64)
but a max pool later can be seen between each layer in there code:
class ItrackerImageModel(nn.Module):
# Used for both eyes (with shared weights) and the face (with unqiue weights)
def __init__(self):
super(ItrackerImageModel, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.CrossMapLRN2d(size=5, alpha=0.0001, beta=0.75, k=1.0),
nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2, groups=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.CrossMapLRN2d(size=5, alpha=0.0001, beta=0.75, k=1.0),
nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 64, kernel_size=1, stride=1, padding=0),
nn.ReLU(inplace=True),
)
I am unable to login and download the dataset. After creating an account, I tried logging in and access was denied.
Hi,
You guys mentioned the the data is split by patient. Where can I get the patients ids used in train/validation/test set?
Hi,
First of all thank you for making your dataset and code available to the public!
We would like to replicate your model for real-time inference (Section 4.2 in the paper). Is the precise network layout / pre-trained model available somewhere?
Thanks in advance,
Tobias
Can I use the models for predicting gaze point on the laptop screen.
I am working on a project to track the gaze to move the mouse pointer around the screen and want to know if I can use your models to predict this gaze from a laptop's builtin webcam.
Thank you.
HI, my nice friend! I have to disturb you again.
Now I want to use the itracker_iter_92000.caffemodel directly for the inference. but I met some problems, I want to be clear more.
I wonder what I describe is right? or maybe I miss something?
Please check for me! Thank you very much!
Best Regards
I have register an account on the website, but I can not sign in, and I can not get the GazeCapture Dataset
When I test the model with a new picture, a SubtractMean is applied after the picture transferred to Tensor.
MeanImg refers to the mean of the training set or the mean of this picture?
Has anyone achieved accurate predictions using their own data with the pytorch implementation? I am getting inaccurate predictions after just changing the data loading paths in ITrackerData.py and using the checkpoint.
Solution: Check down my last comment.It was caused by a fault in my hardware.
I am facing an I/O error "OSError: [Errno 5] Input/output error" when running
prepareDataset.py
I found in internet that I can redirect the output using >/dev/null 2>&1
after
the command,but it doesnt create all subdirectories
python prepareDataset.py --dataset_path [A = where extracted] --output_path [B = where to save new data]
Is this command running fine at your system? I am using Ubuntu 16.04 LTS
I write below the error backtrace:
Traceback (most recent call last):
File "prepareDataset.py", line 273, in
main()
File "prepareDataset.py", line 125, in main
img = np.array(img.convert('RGB'))
File "/home/user/anaconda3/envs/myenv/lib/python3.6/site-packages/PIL/Image.py", line 934, in convert
self.load()
File "/home/user/anaconda3/envs/myenv/lib/python3.6/site-packages/PIL/ImageFile.py", line 234, in load
s = read(self.decodermaxblock)
File "/home/user/anaconda3/envs/myenv/lib/python3.6/site-packages/PIL/JpegImagePlugin.py", line 398, in load_read
s = self.fp.read(read_bytes)
Hi,
It seems that the weights of pytorch model is not correct.
Please help me to re-upload it.
Thanks
hello, dear friend!
I wonder the prediction result is (Yaw, Pitch, Roll)?
and how to change to Two dimensional coordinates?
Thanks a lot!
Hello! Thanks for your hard work!
Recently i've been trying to use the caffe model but since i want to try to use this fully on CPU anyways and i dont want to try to install caffe as a first option, i opted to load the caffe model through openCV's DNN module instead.
I cant see how the caffe model is being used in the repo so i tried to implement the sampe pipeline as the pytorch one. But unfortunately, i have met with strange results..
i stared straight at the camera, but it's way off (6 cm horizontal and 2 cm vertical). I tried looking left and right but seemingly, they yield no effect. The values does not vary according to the general direction of my eyes (as in, the relative changes), so i want to ask if my pipeline is correct?
Also i used some assumptions i observed and i confirmed from the pytorch code
As a background, im using it on my laptop but i noticed this repo succeeds with usage not in a mobile device so what im asking is, what is the expected face distance from the camera?
Thanks!
Hi, thanks for sharing code and dataset.
I have some problem in training
I just run prepareDataset.py and then run the main.py
when running main.py, it stuck in the line 161:enumerate(train_loader)
when running that, the program runs out of my RAM by executing dozens of python and then the program get stucked, even not showing warning message, so I am sure that it is not memory error
My OS is windows and I use anaconda power shell, CUDA version is 10.1
How to fix that problem?
Hi, is there a known reason why the Pytorch version chosen is 0.4.1?
It seems that later versions of pytorch take ~20x longer in the computation of gradients (back-propagation). I wonder if this is a known issue and the main reason why this version of torch was chosen. I encountered this behavior because I need a later version of pytorch to get some extra features.
Steps to reproduce the issue:
git clone https://github.com/CSAILVision/GazeCapture/ cd GazeCapture
docker pull nvcr.io/nvidia/pytorch:20.02-py3
mkdir database && cd database
wget -O gazecapture.tar "https://gazecapture.csail.mit.edu/dataset.php?
tar -xvf gazecapture.tar
cat *.tar.gz | tar zxvf - -i
docker run --gpus all -it --rm -v /home/user/GazeCapture:/mount nvcr.io/nvidia/pytorch:20.02-py3
cd /mount
python prepareDataset.py --dataset_path base/--output_path output
python main.py --data_path output --reset
hi,
firstly, thanks your grate work!
but when I use your pytorch model I have some question:
(1)which is pre-trained model to is useable?
(2)if I want to inference on my own image or video, how it can be work?
thanks very much!
I registered to the website and verified my institutional email and yet cannot login to download the data. Please advise if I am missing something.
Thank you for this great work!
I have 2 questions.
Hello. Thank you for sharing youor code.
I'm currently is trying to launch your pytorch code on webcam. As i understand, i need to firstly detect face and both eyes on the frame and then launch model on that data and i can put anything as y-data since i'm only want to evaluate, not train. But one question still remains - how to get faceGrid? What this array contains and is it possible to get it somehow?
Given that the input image is a rectangle and input face is square, and face grid is calculated from these two images. However, when I view the facegrid data, I find that the height and width are the same, for example 14 x 14. How to generate face grid with square shape content?
After Getting the metadata i am running it with main.py but when i do initially i get the warning "Found GPU0 Quadro K1100M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5." and then i get the run time error "CUDA Cannot read kernel image". I am not able to relate this problem??. I am running through anaconda Windows 10, CUDA 9.0 and pytorch 0.4.1. Thank you.
I have registered using my institute e-mail and verified my account. But I'm unable to login and download the dataset from https://gazecapture.csail.mit.edu/download.php. So could you please provide another available download solution. Thank you. @quantombone @adikhosla @andrewowens @visionATcsail
There were already 3 issues raised regarding faceGrid, but none of them resolved my issue, which is the following -
How is [xLo, yLo, w, h] calculated for faceGrid.json from given face bounding box [X,Y,W,H] in appleFace.json?
E.g. - For recording 00002 & frame 00000.jpg -
[frameW, frameH] = [480, 640]
scaleX = 25/480 = 0.052
scaleY = 25/640 = 0.039
face bounding box [X,Y,W,H] = [38.15, 230.04, 343.68, 343.67]
Now, according to the following code snippet -
GazeCapture/code/faceGridFromFaceRect.m
Lines 29 to 37 in e09c285
Why is there this significant difference in faceGrid.json values? Are these values calculated by using above formulae, or some other formulae? Also, I'm beginning to suspect that the faceGrid.json might have been obtained independently & not by using some formula on appleFace.json. Please clarify..
Thanks
I am using the given pretrained caffe model but getting a euclidean loss of much more than mentioned in the paper.Please look into my code and tell me where I am making mistake.
loading caffe model ,doing a forward pass to get the output.
CaffeModel.zip
Hii...
Sorry to bother with a small issue - The ITrackerData.py file was revised on 26 Jan 2019, where division by 255 was introduced in SubtractMean class. Can you please specify the reason why this change was introduced? I mean, the code worked fine earlier without division with 255. What changed in due course of time that this change was introduced?
Thanks
Hello, could you pls share the parameters of train/test augmentation that are mentioned in the Itracker paper?
The only description in the paper was 'shifting the eyes and the face, changing face grid appropriately.' Could u pls tell us the values/ranges? We just can't reproduce ur accuracy with test augmentions...
Thanks!
I expect both train_y and val_y are the 2D coordination of eye graze .
But the result is quite strange. Most of the coordination are very small or negative. That means most eye graze always point at the left top side. But the fact doesn't like that. Most of them are wrong.
Please concern the circle and the center of circle. The center is the predicted point of eye gaze.
Somebody told me that the data was normalized, but how? How do I find the true gaze point.
i am getting nan while evaluating it on the mpiigaze dataset. I am using pytorch for implementation.
The project site is not working. please fix this issue.
The link to the paper on https://gazecapture.csail.mit.edu/index.php is not working.
Current Link: https://gazecapture.csail.mit.edu/cvpr2016_gazecapture.pdf
Result: 404
Working Link: https://people.csail.mit.edu/khosla/papers/cvpr2016_Khosla.pdf
Hey,
I wanted to know what you guys are passing as arguments to get face grid values. Is Frame W/H is original image size or what values are you passing??. is Grid W/H fixed grid size 25*25?? is labelface x.y.w.h are face detection values?? Please let me know. I am stuck and perplexed in this. Thank you.
Hi and thank you for the awesome work you have done here.
One question regarding the inference on new generation iPhones. I want to infer the model on an iPhone XR the main differences between this model and the previous version (used for the training) are the position of the camera (The camera is almost part of the screen now), and the size of the screen (6.06 inches diagonally for the XR vs 4.6 inches diagonally for the 6 ). As a result, the output of the model is always underestimated.
Is calibration the only answer to this issue? Or can we apply a kind of transformation to the output based on the iPhone size ? How would you tackle this problem?
Many thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.