Code Monkey home page Code Monkey logo

Comments (10)

hwuiwon avatar hwuiwon commented on August 12, 2024 1

@dsinghvi Here's the body content. Thanks for taking a look at it.

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>


<title>api.merge.dev | 502: Bad gateway</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/main.css" />


</head>
<body>
<div id="cf-wrapper">
    <div id="cf-error-details" class="p-0">
        <header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-8">
            <h1 class="inline-block sm:block sm:mb-2 font-light text-60 lg:text-4xl text-black-dark leading-tight mr-2">
              <span class="inline-block">Bad gateway</span>
              <span class="code-label">Error code 502</span>
            </h1>
            <div>
               Visit <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">cloudflare.com</a> for more information.
            </div>
            <div class="mt-3">2023-08-20 13:32:07 UTC</div>
        </header>
        <div class="my-8 bg-gradient-gray">
            <div class="w-240 lg:w-full mx-auto">
                <div class="clearfix md:px-8">
                  
<div id="cf-browser-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    
    <span class="cf-icon-browser block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    
  </div>
  <span class="md:block w-full truncate">You</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    
    Browser
    
  </h3>
  <span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>

<div id="cf-cloudflare-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
    <span class="cf-icon-cloud block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    </a>
  </div>
  <span class="md:block w-full truncate">Atlanta</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
    Cloudflare
    </a>
  </h3>
  <span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>

<div id="cf-host-status" class="cf-error-source relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    
    <span class="cf-icon-server block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-error w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    
  </div>
  <span class="md:block w-full truncate">api.merge.dev</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    
    Host
    
  </h3>
  <span class="leading-1.3 text-2xl text-red-error">Error</span>
</div>

                </div>
            </div>
        </div>

        <div class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
            <div class="clearfix">
                <div class="w-1/2 md:w-full float-left pr-6 md:pb-10 md:pr-0 leading-relaxed">
                    <h2 class="text-3xl font-normal leading-1.3 mb-4">What happened?</h2>
                    <p>The web server reported a bad gateway error.</p>
                </div>
                <div class="w-1/2 md:w-full float-left leading-relaxed">
                    <h2 class="text-3xl font-normal leading-1.3 mb-4">What can I do?</h2>
                    <p class="mb-6">Please try again in a few minutes.</p>
                </div>
            </div>
        </div>

        <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
  <p class="text-13">
    <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7f9afff08bfe07d2</strong></span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
      Your IP:
      <button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
      <span class="hidden" id="cf-footer-ip">50.168.177.77</span>
      <span class="cf-footer-separator sm:hidden">&bull;</span>
    </span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" id="brand_link" target="_blank">Cloudflare</a></span>
    
  </p>
  <script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->


    </div>
</div>
<script>(function(){var js = "window['__CF$cv$params']={r:'7f9afff08bfe07d2',t:'MTY5MjUzODMyNy41MzUwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script><script>(function(){var js = "window['__CF$cv$params']={r:'7f9afff08bfe07d2',t:'MTY5MjUzODMyNy41MzUwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script></body>
</html>

from merge-python-client.

dsinghvi avatar dsinghvi commented on August 12, 2024

@hwuiwon thanks for reporting this issue. Are there any additional details in the body of the 502 error?

from merge-python-client.

dsinghvi avatar dsinghvi commented on August 12, 2024

@hwuiwon is this an accurate description of the error:

You call chunks = self.client.filestorage.files.download_retrieve(id=file.id) 6 times with different file ids. Once you call it for the 7th time, you get a 502.

from merge-python-client.

hwuiwon avatar hwuiwon commented on August 12, 2024

@dsinghvi Yes you're correct

from merge-python-client.

dsinghvi avatar dsinghvi commented on August 12, 2024

@hwuiwon is it possible the issue is with the 7th file id you are using? can you try calling it with a single file id 10 times and checking if that works?

from merge-python-client.

rmkonnur avatar rmkonnur commented on August 12, 2024

Hi @hwuiwon, were you able to try out the suggestion from @dsinghvi? If that still continues to throw a 502 error, would you be able to provide us with your organization name, linked account id that is returning the 502, and the estimated time frame around when you received the 502?
These things should help us find the log for your request to try to figure out what the issue is!

from merge-python-client.

hwuiwon avatar hwuiwon commented on August 12, 2024

@dsinghvi @rmkonnur I just tried again and got this error. I believe that the file ID I used is valid because every time I run this function, it retrieves the up-to-date list of files from the merge endpoint.

The file id that's causing the problem is aed40b72-ab5f-477c-8062-9e55a2317bcc.

For the context,

  • Organization Name: Prism AI
  • Linked Account ID: 75df5146-686f-4d44-8dbc-53453e1487f0

Thanks!

2023-08-21 17:46:25.726 | INFO  | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:46:25.727 | INFO  | MergeService.py:250 download_file - file_id=60651a52-949a-4bee-baf4-cd4ddca3e41d, file_name=bain_report_2021-global-private-equity-report.pdf, in_bytes=True
2023-08-21 17:47:47.183 | INFO  | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:47:47.184 | INFO  | MergeService.py:250 download_file - file_id=dd95be7a-919a-45f6-b25a-9cd073b77ef7, file_name=2022-esg_report.pdf, in_bytes=True
2023-08-21 17:48:02.332 | INFO  | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:48:02.332 | INFO  | MergeService.py:250 download_file - file_id=ce94ef7b-7c7d-4c8d-8dda-147342c4235f, file_name=bain_report_global-private-equity-report-2023.pdf, in_bytes=True
2023-08-21 17:48:24.601 | INFO  | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:48:24.602 | INFO  | MergeService.py:250 download_file - file_id=69e4d7b9-7a29-40c0-bf45-5e72ecc6a220, file_name=2023-global-pe-report---roadshow-deck.pdf, in_bytes=True
2023-08-21 17:48:42.285 | INFO  | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:48:42.285 | INFO  | MergeService.py:250 download_file - file_id=404ab2ba-1ae5-4d4d-998e-67f3c34b5479, file_name=bain-report_the-working-future.pdf, in_bytes=True
2023-08-21 17:49:18.282 | INFO  | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:49:18.282 | INFO  | MergeService.py:250 download_file - file_id=aed40b72-ab5f-477c-8062-9e55a2317bcc, file_name=bain_report_private_equity_report_2020.pdf, in_bytes=True
2023-08-21 17:49:23.330 | ERROR | IntegrationTask.py:50 initiate_file_processing - integration_request=public_token='string' organization_id='p_4fc4bae9_eddb_4950_b2b9_c7074e2678c1' organization_name='Prism AI' organization_admin_id='0b0bba07-e34b-4fe4-9e2b-d9a2f8533968', account_token=[TRUNCATED], error=status_code: 502, body: <!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>


<title>api.merge.dev | 502: Bad gateway</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/main.css" />


</head>
<body>
<div id="cf-wrapper">
    <div id="cf-error-details" class="p-0">
        <header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-8">
            <h1 class="inline-block sm:block sm:mb-2 font-light text-60 lg:text-4xl text-black-dark leading-tight mr-2">
              <span class="inline-block">Bad gateway</span>
              <span class="code-label">Error code 502</span>
            </h1>
            <div>
               Visit <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">cloudflare.com</a> for more information.
            </div>
            <div class="mt-3">2023-08-21 17:49:23 UTC</div>
        </header>
        <div class="my-8 bg-gradient-gray">
            <div class="w-240 lg:w-full mx-auto">
                <div class="clearfix md:px-8">
                  
<div id="cf-browser-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    
    <span class="cf-icon-browser block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    
  </div>
  <span class="md:block w-full truncate">You</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    
    Browser
    
  </h3>
  <span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>

<div id="cf-cloudflare-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
    <span class="cf-icon-cloud block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    </a>
  </div>
  <span class="md:block w-full truncate">Atlanta</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
    Cloudflare
    </a>
  </h3>
  <span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>

<div id="cf-host-status" class="cf-error-source relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    
    <span class="cf-icon-server block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-error w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    
  </div>
  <span class="md:block w-full truncate">api.merge.dev</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    
    Host
    
  </h3>
  <span class="leading-1.3 text-2xl text-red-error">Error</span>
</div>

                </div>
            </div>
        </div>

        <div class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
            <div class="clearfix">
                <div class="w-1/2 md:w-full float-left pr-6 md:pb-10 md:pr-0 leading-relaxed">
                    <h2 class="text-3xl font-normal leading-1.3 mb-4">What happened?</h2>
                    <p>The web server reported a bad gateway error.</p>
                </div>
                <div class="w-1/2 md:w-full float-left leading-relaxed">
                    <h2 class="text-3xl font-normal leading-1.3 mb-4">What can I do?</h2>
                    <p class="mb-6">Please try again in a few minutes.</p>
                </div>
            </div>
        </div>

        <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
  <p class="text-13">
    <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7fa4b63e4a1e458e</strong></span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
      Your IP:
      <button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
      <span class="hidden" id="cf-footer-ip">2610:148:1f00:0:c5ab:a697:6be:7f7a</span>
      <span class="cf-footer-separator sm:hidden">&bull;</span>
    </span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" id="brand_link" target="_blank">Cloudflare</a></span>
    
  </p>
  <script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->


    </div>
</div>
<script>(function(){var js = "window['__CF$cv$params']={r:'7fa4b63e4a1e458e',t:'MTY5MjY0MDE2My4xNjkwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script><script>(function(){var js = "window['__CF$cv$params']={r:'7fa4b63e4a1e458e',t:'MTY5MjY0MDE2My4xNzAwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script></body>
</html>

from merge-python-client.

rmkonnur avatar rmkonnur commented on August 12, 2024

Hi @hwuiwon, thanks for sending that information. Based on the error we're seeing, I would expect there to be something specific to that file thats causing the issue seeing as you're able to download other files of type PDF. Is there anything special about that file that could be considered different from the other PDFs you're downloading?
I tried downloading a few files from one of our test accounts (of all types) and have not been able to recreate your issue.
There are a couple options to move forward:

  1. Can you send over your SDK call code that you're running to generate the above output?
  2. Given permission, I could access your Merge account and attempt to use the SDK to download that file myself. I would recommend removing sensitive materials or linking a new account and including the a subset of the files you're trying to download.
  3. Reach out over intercom and we can continue debugging in a chat setting for faster turnaround time!

from merge-python-client.

hwuiwon avatar hwuiwon commented on August 12, 2024

Hi @rmkonnur ,

Below is the source code of MergeService.py that I use to list and download the files.

import io
import time
import uuid
from typing import IO

from constants import MERGE_API_KEY, SUPPORTED_EXTENSIONS
from exceptions import PrismMergeException, PrismMergeExceptionCode
from loguru import logger
from merge.client import Merge
from merge.core.api_error import ApiError
from merge.resources.filestorage.types import (
    AccountDetails,
    CategoriesEnum,
    File,
    PaginatedFileList,
    PaginatedFolderList,
    SyncStatusStatusEnum,
)


class MergeService:
    """https://github.com/merge-api/merge-python-client"""

    def __init__(self, account_token: str | None = None):
        self.account_token = account_token
        self.client = Merge(api_key=MERGE_API_KEY, account_token=account_token)

    def generate_link_token(self, org_id: str, org_name: str, org_email: str) -> str:
        logger.info("org_id={}, org_name={}, org_email={}", org_id, org_name, org_email)

        try:
            link_token_response = self.client.filestorage.link_token.create(
                end_user_origin_id=str(uuid.uuid4()),
                end_user_organization_name=org_name,
                end_user_email_address=org_email,
                categories=[CategoriesEnum.FILESTORAGE],
            )
        except Exception as e:
            logger.error(
                "org_id={}, org_name={}, org_email={}, error={}",
                org_id,
                org_name,
                org_email,
                str(e),
            )
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_GENERATE_LINK_TOKEN,
                message="Could not generate link token",
            )

        return link_token_response.link_token

    def generate_account_token(self, public_token: str) -> str:
        logger.info("public_token={}", public_token)

        try:
            account_token_response = self.client.filestorage.account_token.retrieve(
                public_token=public_token
            )
        except Exception as e:
            logger.error("public_token={}, error={}", public_token, str(e))
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_GENERATE_ACCOUNT_TOKEN,
                message="Could not generate account token",
            )

        return account_token_response.account_token

    def get_integration_provider(self) -> AccountDetails:
        logger.info("account_token={}", self.account_token)

        if not self.account_token:
            logger.error("Account token can't be null")
            raise PrismMergeException(
                code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
                message="Account token can't be null",
            )

        try:
            integration_provider = self.client.filestorage.account_details.retrieve()
        except Exception as e:
            logger.error("account_token={}, error={}", self.account_token, str(e))
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_FETCH_INTEGRATION_DETAILS,
                message="Could not get integration provider details",
            )

        return integration_provider

    def get_account_owner(self) -> str:
        logger.info("account_token={}", self.account_token)

        if not self.account_token:
            logger.error("Account token can't be null")
            raise PrismMergeException(
                code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
                message="Account token can't be null",
            )

        try:
            response = self.client.filestorage.users.list(is_me=True)
            owner = response.results[0].email_address
        except Exception as e:
            logger.error("account_token={}, error={}", self.account_token, str(e))
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_FETCH_INTEGRATION_DETAILS,
                message="Could not get integration provider details",
            )

        return owner

    def check_sync_status(self) -> bool:
        logger.info("account_token={}", self.account_token)

        if not self.account_token:
            logger.error("Account token can't be null")
            raise PrismMergeException(
                code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
                message="Account token can't be null",
            )

        try:
            sync_status = self.client.filestorage.sync_status.list(page_size=100)
            results = sync_status.results

            if all(r.status == SyncStatusStatusEnum.DONE for r in results):
                return True

            if any(r.status == SyncStatusStatusEnum.FAILED for r in results):
                raise PrismMergeException(
                    code=PrismMergeExceptionCode.FAILED_TO_SYNC,
                    message="Failed to sync",
                )
        except Exception as e:
            logger.error("account_token={}, error={}", self.account_token, str(e))

            if isinstance(e, PrismMergeException):
                raise

            raise PrismMergeException(
                code=PrismMergeExceptionCode.UNKNOWN,
                message=str(e),
            )

        return False

    def list_folders_in_folder(
        self,
        folder_id: str | None = None,
        drive_id: str | None = None,
        next: str | None = None,
    ) -> PaginatedFolderList:
        if not self.account_token:
            logger.error("Account token can't be null")
            raise PrismMergeException(
                code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
                message="Account token can't be null",
            )

        if not folder_id and not drive_id:
            logger.error(
                "account_token={}, folder_id={}, drive_id={}, next={}, error={}",
                self.account_token,
                folder_id,
                drive_id,
                next,
                "Either drive id or folder id is required",
            )
            raise PrismMergeException(
                code=PrismMergeExceptionCode.REQUIRES_DRIVE_ID,
                message="Either drive id or folder id is required",
            )

        logger.info(
            "account_token={}, folder_id={}, drive_id={}, next={}",
            self.account_token,
            folder_id,
            drive_id,
            next,
        )

        try:
            folder_list = self.client.filestorage.folders.list(
                page_size=100, folder_id=folder_id, drive_id=drive_id, cursor=next
            )
        except Exception as e:
            logger.error(
                "account_token={}, folder_id={}, drive_id={}, next={}, error={}",
                self.account_token,
                folder_id,
                drive_id,
                next,
                str(e),
            )
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_LIST_FOLDERS,
                message="Could not fetch folders",
            )

        return folder_list

    def list_all_files(self, next: str | None = None) -> PaginatedFileList:
        logger.info("account_token={}", self.account_token)

        if not self.account_token:
            logger.error("Account token can't be null")
            raise PrismMergeException(
                code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
                message="Account token can't be null",
            )

        try:
            file_list = self.client.filestorage.files.list(page_size=100, cursor=next)
        except Exception as e:
            logger.error(
                "account_token={}, next={}, error={}", self.account_token, next, str(e)
            )
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_LIST_FILES,
                message="Could not fetch files",
            )

        return file_list

    def generate_file_list(self) -> list[File]:
        logger.info("account_token={}", self.account_token)

        file_list: list[File] = []
        response = self.list_all_files()

        file_list.extend(response.results)

        while response.next is not None:
            try:
                response = self.list_all_files(next=response.next)
                file_list.extend(response.results)
            except ApiError as e:
                logger.info(
                    "account_token={}, error={}, Too many requests. Waiting for 1 min to resume..",
                    self.account_token,
                    e,
                )
                time.sleep(60)

        return file_list

    def download_file(
        self, file: File, in_bytes: bool | None = False
    ) -> IO[bytes] | str:
        logger.info(
            "file_id={}, file_name={}, in_bytes={}", file.id, file.name, in_bytes
        )

        if not self.account_token:
            logger.error("Account token can't be null")
            raise PrismMergeException(
                code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
                message="Account token can't be null",
            )

        file_extension = file.name.split(".")[-1]

        if file_extension not in SUPPORTED_EXTENSIONS:
            logger.error("File type not supported: .{}", file_extension)
            raise PrismMergeException(
                code=PrismMergeExceptionCode.FILE_TYPE_NOT_SUPPORTED,
                message="File type not supported",
            )

        try:
            chunks = self.client.filestorage.files.download_retrieve(id=file.id)
        except Exception as e:
            logger.error(
                "account_token={}, file_id={}, error={}",
                self.account_token,
                file.id,
                str(e),
            )
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_DOWNLOAD_FILE,
                message="Could not download file",
            )

        if in_bytes:
            response = b"".join(chunks)
            return io.BytesIO(response)

        tmp_uuid = str(uuid.uuid4())

        with open(tmp_uuid, "wb") as f:
            for chunk in chunks:
                f.write(chunk)

        return tmp_uuid

    def remove_integration(self) -> None:
        try:
            self.client.filestorage.delete_account.delete()
        except Exception as e:
            logger.error("account_token={}, error={}", self.account_token, str(e))
            raise PrismMergeException(
                code=PrismMergeExceptionCode.COULD_NOT_DELETE_INTEGRATION,
                message="Could not delete integration",
            )

This is the load_and_parse() function (part of the DataPipelineServiceLocal.py)

def __init__(self, org_id: str, account_token: str):
        self.org_id = org_id
        self.account_token = account_token
        self.merge_service = MergeService(account_token=account_token)
        ...

def load_data(self, all_files: list[File]) -> list:
        logger.info("Started loading data. account_token={}", self.account_token)

        all_items = [{"data": file} for file in all_files]
        all_file_ids = [file.id for file in all_files]

        loaded_docs = []
        for file in all_items:
            file_docs = self.load_and_parse_files(file)
            loaded_docs.extend(file_docs)

        logger.info("Finished loading data. account_token=", self.account_token)

        return loaded_docs

def load_and_parse_files(
        self, file_row: dict[str, File]
    ) -> list[dict[str, Document]]:
        logger.info(
            "Started loading and parsing files. account_token={}", self.account_token
        )
        documents = []

        try:
            file_in_bytes: IO[bytes] = self.merge_service.download_file(
                file=file_row["data"], in_bytes=True
            )
            loaded_doc = self.loader.load_data(
                file=file_in_bytes, split_documents=False
            )
            loaded_doc[0].doc_id = file_row["data"].id
            loaded_doc[0].metadata = {
                "file_id": file_row["data"].id,
                "process_date": self.process_date,
            }

            documents.extend(loaded_doc)
        except PrismException as e:
            logger.error("file_row={}, error={}", file_row, e)
            self.not_processed_file_ids.append(file_row["data"].id)

        return [{"doc": doc} for doc in documents]

And this is how I call it

file_list = self.generate_file_list()
loaded_docs = self.load_data(all_files)

Feel free to access any file in this account. There are no sensitive informations.

Also where can I access the intercom? Thanks!

from merge-python-client.

dsinghvi avatar dsinghvi commented on August 12, 2024

@hwuiwon Since a consistent reproducible error was not found, we're going to close this issue. If you are still encountering issues feel free to reopen!

from merge-python-client.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.