Large file upload fix #121

maheshsooryambylu · 2025-03-08T06:50:25Z

Fixes the issues related to large file upload using sdm java plugin and performance is on par with that of application option when compared in similar network speed.
The crux of the problem is that using our locally hosted container even the existing upload function in SDMServiceImpl works for large files, but fails miserably for larger files ( even half a GB file upload is to take around 15 mins). Why it works because InputStream in locally hosted container gets filled up quick enough for being able to serve the InputStreamReader so that upload logic where InputStream being read in chunks and sent to DI from the locally hosted middleware. Now juxtaposing this behaviour with that of leading application container hosted in BTP. By the time the Upload logic in SDMServiceImpl was being executed the InputStream is not ready with the complete stream data rather stream would have few chunks to be read. That throttling was good enough for it to manifest in large file uplpoad scenario.

Gist of the fix is as follows:
DocumentUploadService is the new class created to upload documents. The upload of large files is accomplished by sending multi part upload to DI. For the smaller files still usinf single chunk upload. Right now hardcoding to 100MB as file size to determine large file or not. Used Javas reactive framework to make http calls as in the case of continuous streaming that would be a better choice but here it would not make much difference as we are sending chunk by chunk. Anyways keeping it that way to introduce ourselves to much adept reactive framework ! Also note that I have used the MemeoryManagement bean to emit the current usage of the heap atleast once in every 5 chunks of processing (to understand if the GC is taking place as per our usgae of the heap). Also note that I am using System.out.println for now until I understand the logging framework being used (Can be changed to log. messages once we know where and how log object is wired). Avoided using SDMServiceImpl because in the fix I am using all httpClient5 libraries where as in the SDMServiceImpl there are other functions using httpClient4
ReadAheadInputStream is to override the InputStream for the simple reason that processing the Inputstream chunk by chunks and calling DI api wont scale out because DI spends around 20-30 sec to process a chunk of 100MB. That is the precious time lost if we were to wait idle for that duration for every chunk. Instead keep the next chunk ready to be sent when the current chunk is being sent to DI and we wait for the synchronous response from DI. This is achieved using Executor framework spawning a thread to read the next chunk and populate the FIFO queue. That facilitates us read the next chunk from the head of the queue once the DI response for the current chunk (appendContentStream) is obtained. Do this for all chunks until the last chunk

Here is the comparative performance of this Fix Vs that of Application option! Please refer the attached files with the corresponding file names
2.13 GB file - Plugin and Application option taking 10.9 min and 11.2 min respectively at 69mbps upload speed bandwidth
167 MB file - Plugin and Application option taking 1.1 min and 1.1 min respectively at 73mbps upload speed bandwidth

ApplicationOptionUploading2200MBFileAt69mbpsSpeed

…for file upload

ashishjain14

Thanks for excellent root cause analysis @maheshsooryambylu
Can you please review the following:

Replace system.out.println statements with log messages
Can we keep the chunk size configurable instead of hardcoding it to 100 MB for example on the basis of file size.
Do we consider retries in case of failures.

maheshsooryambylu · 2025-03-11T05:19:24Z

Thanks for excellent root cause analysis @maheshsooryambylu Can you please review the following:

Replace system.out.println statements with log messages

Can we keep the chunk size configurable instead of hardcoding it to 100 MB for example on the basis of file size.

Do we consider retries in case of failures.

Thanks Ashish. Thanks for the review. Regarding the comments

Yes, those messages are just a placeholder as at present I dont see any logging framework is being used in our plugin code. Will align with @rashmiangadi11 and use sl4j may be
chunksize will be configurable via env variable, but if not present it would assume 100MB. I would refrain from making the chunksize as dynamic value for the simple reason that there is nothing much we can deduce dynamically. Like for example, the ideal chunksize would be based on the on an avergae what would be the defintion of large file size for the given application and netwrok speed would be the only other factor that would help us deduce an ideal chunk size which anyway from the code it is difficult to figure out.
Yes, that is very much needed. will add in the code

…dcontent failing the whole operation wont fail

ashishjain14 · 2025-09-17T04:26:15Z

Hi @maheshsooryambylu : Can we close this PR?

maheshsooryambylu added 2 commits March 8, 2025 11:25

revert

2ae6937

Fix for large file upload issue

22fd021

maheshsooryambylu requested review from ashishjain14, dheerasameerk, rashmiangadi11, rishikunnath2747 and yashmeet29 March 8, 2025 06:50

EventHandler to call DocumentUploadService instead of SDMServiceImpl …

b87b0b3

…for file upload

ashishjain14 reviewed Mar 9, 2025

View reviewed changes

Adding retry for every appendContent call so that just with one appen…

4dc8b77

…dcontent failing the whole operation wont fail

maheshsooryambylu mentioned this pull request Mar 17, 2025

Large file upload fix without rest template #123

Open

yashmeet29 added the do not merge label Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large file upload fix #121

Large file upload fix #121

Uh oh!

maheshsooryambylu commented Mar 8, 2025 •

edited

Loading

Uh oh!

ashishjain14 left a comment

Uh oh!

maheshsooryambylu commented Mar 11, 2025

Uh oh!

ashishjain14 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Large file upload fix #121

Are you sure you want to change the base?

Large file upload fix #121

Uh oh!

Conversation

maheshsooryambylu commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashishjain14 left a comment

Choose a reason for hiding this comment

Uh oh!

maheshsooryambylu commented Mar 11, 2025

Uh oh!

ashishjain14 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maheshsooryambylu commented Mar 8, 2025 •

edited

Loading