Code Monkey home page Code Monkey logo

Comments (19)

christiaanwesterbeek avatar christiaanwesterbeek commented on August 22, 2024

But this module wraps tedious (ao). And you can stream with tedious because there you can define row-receiving event handlers.

request.on('row', function(columns) { console.log('received a row, now stream...'); })

With so many records you may want to use tedious directly instead of node-mssql.

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

As Christiaan said, this module is built on Tedious, but there is no support for streaming atm. Tedious unfortunately has some memory issues when handling large amount of data (reported here) and need major changes in core.

from node-mssql.

joshua-mcginnis avatar joshua-mcginnis commented on August 22, 2024

Appreciate the response. This is unfortunate. It means that there is currently no reliable mssql driver for node as I'd consider reliable streaming to be a key feature.

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

Hello guys, sorry to interrupt I didnt want to create a new issue instead we could continue discussion here, I am working on a system which requires me to pull more than 1 Million rows from SQL Server. I am using the default configuration of the module i.e. using Tedious and it is working fine for me. I saw the memory load issue when handling huge data and was able to figure out the reason for it and its fix.

In the lib/tedious.j, on line 445, instead of pushing the result row to the recordset, why not give an option where if the developer has configured the query to operate in a 'streaming' mode just increment an integer (to count rows) and in the final callback instead of sending the entire recordset we can just send the row count. The user can then verify if the result is same by firing a separate SELECT COUNT(*) query.

With the above solution, I was able to pull in around 3 Million rows of data and storing batches of 10000 in using simple file io asynchronously took max of 100-120 mb with no spikes (and finished in about 3.5 minutes). And as far as I understand the above solution lifts all the weight from the memory, so even if you have 100 million rows it shouldn't make a difference.

I am willing to implement this minor tweak within this week by forking the repo, @patriksimek do you have suggestions? I can send a pull request once i am done.

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

I understand that the issue highlighted in the comments (stackoverflow link) might be with Tedious itself, I will try the above on a large data source and get back to you with results.

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

@shoaibmerchant, thanks, PR for that is very welcome!

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

Sure no problem, i will start working on it. Great work with the overall module, haven't faced any issues since I started using it.

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

Hey @patriksimek I have added a commit to my forked repo. I have implemented two things -

  1. Streaming query
    I have implemented the solution I was suggesting earlier, and it has worked fine for me at 10 million rows (average RAM usage of 60-70mb). For this I have added at third parameter to the query function called options which is optional, and for now it takes the following input (you can use this to extend features later) -
var options = {mode:"normal"}      //Default Mode (your existing execution)

var options = {mode:"stream"}     
//Streaming mode, the final recordset sent back after completion is a count and not the rows itself.

request.query('select .. from ...', function(err, recordset) {
        console.log(recordset);
    }, options);

If the parameter is not defined it resorts to the normal mode, therefore this addition will not break existing implementations if the

  1. Allowing users to subscribe to the meta for a query.
    Actually I needed this for my project, and I was just checking out your source code, you have done so much work with the custom data types and sql data types mapping and stuff, why not emit the columns out?

So all i have done is emitted the meta in your columnMetaData event (tedious), and the user will be able to access it via

 request.on('meta', function(meta){
        console.log(meta);
    });

Please note - I am neither used to CoffeeScript nor a big fan of it, and due to the shortage of time I couldnt test my additions on the coffeescript source. But somehow with the help of basic syntax understanding and free JS to coffeescript converters I have added my code in the main.coffee and tedious.coffee, but I have been unable to test their conversion to JS. I originally implemented my solution in the js files directly which you can find in /edits/main.js and /edits/tedious.js. Also everywhere I have made edits you will find comments /* Edit by shoaibmerchant */ with the original code to help you out.\

Let me know if you are comfortable with the changes, so that I can send a PR. Thanks!

P.S - Sorry for the excessive delay, deadlines and stuff, all hell breaks loose.

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

Thank you for your work on streaming, I took a quick look and found one major thing that needs to be changed - third argument to query. Callback must alway be last argument here and adding argument before callback would break current API, so I would advice you to use somthing like request.stream = true.

I will take a deeper look at you code later this week... as you said, damn deadlines...

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

Yes I had also wondered that, callback in the last helps all the promises libraries, and the request.stream suggestion is brilliant, why didnt I think of that.

I will push a commit tomorrow with that fix. Thanks

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

I have just released new version with support for streaming. According to my test it should deal with large datasets very well now.

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

Hey, sorry I couldnt release the fix soon enough. Anyways I saw the implementation in the new version, looks great. I hope I was of help. Also could you add the emit('meta') too in the new version, like I suggested in my earlier comments?

 request.on('meta', function(meta){
        console.log(meta);
    });

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

Don't use on('meta'), just use on('recordset'). Functionally it's the same.

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

The 'recordset' event will also give me the columns data? I wanted the columns data type and stuff.

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

Yes, it's described in streaming example here.

from node-mssql.

shoaibmerchant avatar shoaibmerchant commented on August 22, 2024

Okay thats awesome, I think you can ahead and close the issue now. Thanks!

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

Thank you Shoaib for you interest in making this module better.

from node-mssql.

joshua-mcginnis avatar joshua-mcginnis commented on August 22, 2024

@patriksimek et. all.

If tedious is emitting rows using an event emitter, as opposed to stream events, than isn't any stream on top of that still restrained by the event emitter?

What are your thoughts on this?

from node-mssql.

patriksimek avatar patriksimek commented on August 22, 2024

As far I know streams are just an event emitters so I don't thik streams on top of event emitters should be restrained at all.

from node-mssql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.