Code Monkey home page Code Monkey logo

tensorspark's People

Contributors

ctn avatar luongthevinh avatar masoodk avatar phoaivu avatar phvu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorspark's Issues

TensorSpark productionalized in yarn-cluster mode

Hey Arimo contributors,
Thanks for open-sourcing your TensorSpark!
I have done few modifications to a fork of your repo that are mostly relevant to someone interested in taking TensorSpark to production in yarn-cluster mode with CPU machines. You can go through my commit comments to see if there is anything you’d like to bring to your repo; feel free to let me know and I’ll send a PR for the branch and you can then “git cherry-pick” the commits you’re interested in.
I’ve been working with the MNIST dataset; if I find the time, I’ll try to create the ImageNet/AlexNet scenario which I see to be on your roadmap as well.
Thanks.

some modify to accelerate the train function

in file "paramservermodel.py"
int function def train(self, labels, features):

for i in range(len(self.compute_gradients)):

    #     self.gradients[
    #         i] += self.compute_gradients[i][0].eval(feed_dict=feed)       

because its not necessary to use "for" in this code, replaced by:

grads, test_error_rate = self.session.run([self.compute_gradients,self.error_rate],feed_dict=feed)
self.gradients[:] = [g[0] for g in grads]

this will save several times of gpu train, especial in mnist one iter time from 7ms to 1ms in my computer

TensorSpark on HPC cluster testing

Hi,

I have been able to run this code on an HPC cluster. Currently, I'm trying to figure out how to test with this set up.

Any suggestion would be useful,
Thank you!

How to save the trained model to HDFS?

After I have trained the model, I can not find an interface to save the trained model to HDFS. Is there any way that could solve this problem? much thanks.

a bug of "OOM, about gpu" when I run it on more than one spark worker node

for example : I had change the files to load my own pic like [None,32,32,3] . Everything is OK, but when I set the partition=2 or 4 , 8 ... and my computer information is gtx1070, ubuntu14.04, 8G. I also change the model init code with:
config = tf.ConfigProto() config.gpu_options.allow_growth = True config.gpu_options.allocator_type = 'BFC' #config.gpu_options.per_process_gpu_memory_fraction = 0.2 session = tf.Session(config=config)
upon will enable several process in one gpu.
the bug is when the programer run some epoches , I find "nvidia-smi" 's gpu memory grows without stop.
from 800MB to 2G , 4G, 8G... finally show some errors like cuda OOM.
my way to solve it:
after my check and try to fix it, I find a function leads to the GPU Memory Leak
def reset_gradients(self): #with self.session.as_default(): #self.gradients = [tf.zeros(g[1].get_shape()).eval() for g in self.compute_gradients] self.gradients = [0.0]*len(self.compute_gradients) # my modify self.num_gradients = 0
though I don't the details why this change can works ,but it did.
email :[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.