Large file upload fix #121
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes the issues related to large file upload using sdm java plugin and performance is on par with that of application option when compared in similar network speed.
The crux of the problem is that using our locally hosted container even the existing upload function in SDMServiceImpl works for large files, but fails miserably for larger files ( even half a GB file upload is to take around 15 mins). Why it works because InputStream in locally hosted container gets filled up quick enough for being able to serve the InputStreamReader so that upload logic where InputStream being read in chunks and sent to DI from the locally hosted middleware. Now juxtaposing this behaviour with that of leading application container hosted in BTP. By the time the Upload logic in SDMServiceImpl was being executed the InputStream is not ready with the complete stream data rather stream would have few chunks to be read. That throttling was good enough for it to manifest in large file uplpoad scenario.
Gist of the fix is as follows:
DocumentUploadService is the new class created to upload documents. The upload of large files is accomplished by sending multi part upload to DI. For the smaller files still usinf single chunk upload. Right now hardcoding to 100MB as file size to determine large file or not. Used Javas reactive framework to make http calls as in the case of continuous streaming that would be a better choice but here it would not make much difference as we are sending chunk by chunk. Anyways keeping it that way to introduce ourselves to much adept reactive framework ! Also note that I have used the MemeoryManagement bean to emit the current usage of the heap atleast once in every 5 chunks of processing (to understand if the GC is taking place as per our usgae of the heap). Also note that I am using System.out.println for now until I understand the logging framework being used (Can be changed to log. messages once we know where and how log object is wired). Avoided using SDMServiceImpl because in the fix I am using all httpClient5 libraries where as in the SDMServiceImpl there are other functions using httpClient4
ReadAheadInputStream is to override the InputStream for the simple reason that processing the Inputstream chunk by chunks and calling DI api wont scale out because DI spends around 20-30 sec to process a chunk of 100MB. That is the precious time lost if we were to wait idle for that duration for every chunk. Instead keep the next chunk ready to be sent when the current chunk is being sent to DI and we wait for the synchronous response from DI. This is achieved using Executor framework spawning a thread to read the next chunk and populate the FIFO queue. That facilitates us read the next chunk from the head of the queue once the DI response for the current chunk (appendContentStream) is obtained. Do this for all chunks until the last chunk
Here is the comparative performance of this Fix Vs that of Application option! Please refer the attached files with the corresponding file names
2.13 GB file - Plugin and Application option taking 10.9 min and 11.2 min respectively at 69mbps upload speed bandwidth
167 MB file - Plugin and Application option taking 1.1 min and 1.1 min respectively at 73mbps upload speed bandwidth