Code Monkey home page Code Monkey logo

Comments (7)

joeg avatar joeg commented on May 23, 2024

We are looking into this.

After initial investigations it appears there may be an issue in BufferedInputStream which is attempting to create a negative sized array while reading. The reason the app appers to "hang" is that the exception is thrown on the main thread in your app which will subsequently die an no longer be writing data to the blob input stream. I have a consistent repro using the code you supplied. If you remove BufferedInputStream and read directly from the FileInputStream this work correctly. Also note, this stream is only read on the main thread.

Exception in thread "main" java.lang.NegativeArraySizeException
at java.io.BufferedInputStream.fill(BufferedInputStream.java:205)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at com.microsoft.windowsazure.services.core.storage.utils.Utility.writeToOutputStream(Utility.java:971
at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.write(BlobOutputStream.java:546)
at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:447)
at Blobs.main(Blobs.java:78)

I will continue investigating this issue, but for now I would recommend not using the BufferedInputStream.

from azure-sdk-for-java.

mikebell90 avatar mikebell90 commented on May 23, 2024

Actually the main issue for me was not the buffering. It was the undocumented (ok, it's documented, but only in the C# sdk) 90 second timeout. This combined with a auto-retry of 3 was problematic with a large file.

from azure-sdk-for-java.

joeg avatar joeg commented on May 23, 2024

We were able to repro this without using the library at all by simply opening a file and copying it to another local file. It seems in some versions of the jre that when mark is used on a BufferedInputStream then it can cacluate the arraySize incorrectly.

The Timeout is applied in two places, the HTTPUrlConnecitons readTimeout and as a url parameter to the service. The service will use the timeout value ( converted to seconds) on the server side. To be clear this does not mean that the entire blob has to uploaded / downloaded in the given timeout, only that either the server couldnt process the request in the given amount of time OR the client could not read data from the server in a given amount of time.

A retry executes the same operation again depending on the results of the previous attempt. This means that any subsequent retry will get its own timeout and is not impacted by prior attempts.

from azure-sdk-for-java.

mikebell90 avatar mikebell90 commented on May 23, 2024

Let me try to be clear. The bufferedInputStream while interesting, and hopefully helpful, was NOT the way I was always calling. More often I was calling either with a ByteInputStream (for items < 50 MB) or a MarkableFileInputStream (this is my own implementation of FileInputStream using RandomAccessFile to allow marking). So the example I was showing was just ONE INSTANCE of the initial failure.

I was running my tests on a wireless connection and a slow ADSL one (256k up stream). I started with a big file (150 MB) and found it never seemed to finish. I then narrowed the issue to
a file exceeding about 3500 MB causing the issue. Finally I resolved the issue kludgily, by realizing it was the 90 second timeout, and setting this higher.

This issue was even worse with the 150 MB file, because I had concurrent requests on to 10. So 10 different threads were doing 4 MB chunks, each was failing and retrying (which is why it seemed to go on forever (eg 1.5 hours, when 1 hour should have been sufficient for a "normal" upload.

So for me, it was pure timeout. Once I raised this to 30 minutes (yes I know, but I had to force it all the up to 64 MB before the 4 MB partitioning occurred), it worked with files of 4, 10, 60, 80, 100, and 150 MB.

So the BufferedInputStream issue - not denying it may exist and certainly appreciating you looking into it, but for me, I had done tests with and without it and was having the same whacky issues. In fact my initial issue was with my MarkableFileInputStream. I then made it a bytearrayStream, and simply for CLARITY of showing the repro, provided you with code that showed the FileInputStream and then the BufferedINputStream.


From: joeg [email protected]
To: mikebell90 [email protected]
Sent: Tuesday, January 10, 2012 10:01 AM
Subject: Re: [azure-sdk-for-java] Blob storage hangs for files > about 3500 kb (#1)

We were able to repro this without using the library at all by simply opening a file and copying it to another local file. It seems in some versions of the jre that when mark is used on a BufferedInputStream then it can cacluate the arraySize incorrectly.

The Timeout is applied in two places, the HTTPUrlConnecitons readTimeout and as a url parameter to the service. The service will use the timeout value ( converted to seconds) on the server side. To be clear this does not mean that the entire blob has to uploaded / downloaded in the given timeout, only that either the server couldnt process the request in the given amount of time OR the client could not read data from the server in a given amount of time.

A retry executes the same operation again depending on the results of the previous attempt. This means that any subsequent retry will get its own timeout and is not impacted by prior attempts.


Reply to this email directly or view it on GitHub:
#1 (comment)

from azure-sdk-for-java.

joeg avatar joeg commented on May 23, 2024

Yes, on a slow connection you are correct that you may need a higher timeout value.

However as part of this investigation we have identified and reported a bug in BufferedInputStream regarding using mark values over 1 GB 1073741824 which will cause a NegativeArraySize Exception to be thrown.

We will be updating the library to use a smaller mark size to avoid this bug altogether.

from azure-sdk-for-java.

mikebell90 avatar mikebell90 commented on May 23, 2024

FYI, another group of folks ran into that. Might get a useful workaround from them

https://bitbucket.org/jmurty/jets3t/issue/99/repeatablerequestentity-blindly-marks


From: joeg [email protected]
To: mikebell90 [email protected]
Sent: Thursday, January 12, 2012 10:51 AM
Subject: Re: [azure-sdk-for-java] Blob storage hangs for files > about 3500 kb (#1)

Yes, on a slow connection you are correct that you may need a higher timeout value.

However as part of this investigation we have identified and reported a bug in BufferedInputStream regarding using mark values over 1 GB 1073741824 which will cause a NegativeArraySize Exception to be thrown.


Reply to this email directly or view it on GitHub:
#1 (comment)

from azure-sdk-for-java.

joeg avatar joeg commented on May 23, 2024

This issue has been resolved in the most recent pull request.

from azure-sdk-for-java.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.