Guys, InputStream is0 = new BufferedInputStream(new FileInputStream(f)); Date

azure,azure-sdk-for-java

joeg commented on May 23, 2024

We are looking into this.

After initial investigations it appears there may be an issue in BufferedInputStream which is attempting to create a negative sized array while reading. The reason the app appers to "hang" is that the exception is thrown on the main thread in your app which will subsequently die an no longer be writing data to the blob input stream. I have a consistent repro using the code you supplied. If you remove BufferedInputStream and read directly from the FileInputStream this work correctly. Also note, this stream is only read on the main thread.

Exception in thread "main" java.lang.NegativeArraySizeException
at java.io.BufferedInputStream.fill(BufferedInputStream.java:205)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at com.microsoft.windowsazure.services.core.storage.utils.Utility.writeToOutputStream(Utility.java:971
at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.write(BlobOutputStream.java:546)
at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:447)
at Blobs.main(Blobs.java:78)

I will continue investigating this issue, but for now I would recommend not using the BufferedInputStream.

from azure-sdk-for-java.

mikebell90 commented on May 23, 2024

Actually the main issue for me was not the buffering. It was the undocumented (ok, it's documented, but only in the C# sdk) 90 second timeout. This combined with a auto-retry of 3 was problematic with a large file.

from azure-sdk-for-java.

joeg commented on May 23, 2024

We were able to repro this without using the library at all by simply opening a file and copying it to another local file. It seems in some versions of the jre that when mark is used on a BufferedInputStream then it can cacluate the arraySize incorrectly.

The Timeout is applied in two places, the HTTPUrlConnecitons readTimeout and as a url parameter to the service. The service will use the timeout value ( converted to seconds) on the server side. To be clear this does not mean that the entire blob has to uploaded / downloaded in the given timeout, only that either the server couldnt process the request in the given amount of time OR the client could not read data from the server in a given amount of time.

A retry executes the same operation again depending on the results of the previous attempt. This means that any subsequent retry will get its own timeout and is not impacted by prior attempts.

from azure-sdk-for-java.

mikebell90 commented on May 23, 2024

Let me try to be clear. The bufferedInputStream while interesting, and hopefully helpful, was NOT the way I was always calling. More often I was calling either with a ByteInputStream (for items < 50 MB) or a MarkableFileInputStream (this is my own implementation of FileInputStream using RandomAccessFile to allow marking). So the example I was showing was just ONE INSTANCE of the initial failure.

I was running my tests on a wireless connection and a slow ADSL one (256k up stream). I started with a big file (150 MB) and found it never seemed to finish. I then narrowed the issue to
a file exceeding about 3500 MB causing the issue. Finally I resolved the issue kludgily, by realizing it was the 90 second timeout, and setting this higher.

This issue was even worse with the 150 MB file, because I had concurrent requests on to 10. So 10 different threads were doing 4 MB chunks, each was failing and retrying (which is why it seemed to go on forever (eg 1.5 hours, when 1 hour should have been sufficient for a "normal" upload.

So for me, it was pure timeout. Once I raised this to 30 minutes (yes I know, but I had to force it all the up to 64 MB before the 4 MB partitioning occurred), it worked with files of 4, 10, 60, 80, 100, and 150 MB.

So the BufferedInputStream issue - not denying it may exist and certainly appreciating you looking into it, but for me, I had done tests with and without it and was having the same whacky issues. In fact my initial issue was with my MarkableFileInputStream. I then made it a bytearrayStream, and simply for CLARITY of showing the repro, provided you with code that showed the FileInputStream and then the BufferedINputStream.

From: joeg [email protected]
To: mikebell90 [email protected]
Sent: Tuesday, January 10, 2012 10:01 AM
Subject: Re: [azure-sdk-for-java] Blob storage hangs for files > about 3500 kb (#1)

We were able to repro this without using the library at all by simply opening a file and copying it to another local file. It seems in some versions of the jre that when mark is used on a BufferedInputStream then it can cacluate the arraySize incorrectly.

The Timeout is applied in two places, the HTTPUrlConnecitons readTimeout and as a url parameter to the service. The service will use the timeout value ( converted to seconds) on the server side. To be clear this does not mean that the entire blob has to uploaded / downloaded in the given timeout, only that either the server couldnt process the request in the given amount of time OR the client could not read data from the server in a given amount of time.

A retry executes the same operation again depending on the results of the previous attempt. This means that any subsequent retry will get its own timeout and is not impacted by prior attempts.

Reply to this email directly or view it on GitHub:
#1 (comment)

from azure-sdk-for-java.

joeg commented on May 23, 2024

Yes, on a slow connection you are correct that you may need a higher timeout value.

However as part of this investigation we have identified and reported a bug in BufferedInputStream regarding using mark values over 1 GB 1073741824 which will cause a NegativeArraySize Exception to be thrown.

We will be updating the library to use a smaller mark size to avoid this bug altogether.

from azure-sdk-for-java.

mikebell90 commented on May 23, 2024

FYI, another group of folks ran into that. Might get a useful workaround from them

https://bitbucket.org/jmurty/jets3t/issue/99/repeatablerequestentity-blindly-marks

From: joeg [email protected]
To: mikebell90 [email protected]
Sent: Thursday, January 12, 2012 10:51 AM
Subject: Re: [azure-sdk-for-java] Blob storage hangs for files > about 3500 kb (#1)

Yes, on a slow connection you are correct that you may need a higher timeout value.

However as part of this investigation we have identified and reported a bug in BufferedInputStream regarding using mark values over 1 GB 1073741824 which will cause a NegativeArraySize Exception to be thrown.

Reply to this email directly or view it on GitHub:
#1 (comment)

from azure-sdk-for-java.

joeg commented on May 23, 2024

This issue has been resolved in the most recent pull request.

from azure-sdk-for-java.

Blob storage hangs for files > about 3500 kb about azure-sdk-for-java HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent