Comments (10)
@dsinghvi Here's the body content. Thanks for taking a look at it.
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>api.merge.dev | 502: Bad gateway</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/main.css" />
</head>
<body>
<div id="cf-wrapper">
<div id="cf-error-details" class="p-0">
<header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-8">
<h1 class="inline-block sm:block sm:mb-2 font-light text-60 lg:text-4xl text-black-dark leading-tight mr-2">
<span class="inline-block">Bad gateway</span>
<span class="code-label">Error code 502</span>
</h1>
<div>
Visit <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">cloudflare.com</a> for more information.
</div>
<div class="mt-3">2023-08-20 13:32:07 UTC</div>
</header>
<div class="my-8 bg-gradient-gray">
<div class="w-240 lg:w-full mx-auto">
<div class="clearfix md:px-8">
<div id="cf-browser-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
<div class="relative mb-10 md:m-0">
<span class="cf-icon-browser block md:hidden h-20 bg-center bg-no-repeat"></span>
<span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
</div>
<span class="md:block w-full truncate">You</span>
<h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
Browser
</h3>
<span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>
<div id="cf-cloudflare-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
<div class="relative mb-10 md:m-0">
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
<span class="cf-icon-cloud block md:hidden h-20 bg-center bg-no-repeat"></span>
<span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
</a>
</div>
<span class="md:block w-full truncate">Atlanta</span>
<h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
Cloudflare
</a>
</h3>
<span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>
<div id="cf-host-status" class="cf-error-source relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
<div class="relative mb-10 md:m-0">
<span class="cf-icon-server block md:hidden h-20 bg-center bg-no-repeat"></span>
<span class="cf-icon-error w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
</div>
<span class="md:block w-full truncate">api.merge.dev</span>
<h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
Host
</h3>
<span class="leading-1.3 text-2xl text-red-error">Error</span>
</div>
</div>
</div>
</div>
<div class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="clearfix">
<div class="w-1/2 md:w-full float-left pr-6 md:pb-10 md:pr-0 leading-relaxed">
<h2 class="text-3xl font-normal leading-1.3 mb-4">What happened?</h2>
<p>The web server reported a bad gateway error.</p>
</div>
<div class="w-1/2 md:w-full float-left leading-relaxed">
<h2 class="text-3xl font-normal leading-1.3 mb-4">What can I do?</h2>
<p class="mb-6">Please try again in a few minutes.</p>
</div>
</div>
</div>
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7f9afff08bfe07d2</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
Your IP:
<button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
<span class="hidden" id="cf-footer-ip">50.168.177.77</span>
<span class="cf-footer-separator sm:hidden">•</span>
</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" id="brand_link" target="_blank">Cloudflare</a></span>
</p>
<script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->
</div>
</div>
<script>(function(){var js = "window['__CF$cv$params']={r:'7f9afff08bfe07d2',t:'MTY5MjUzODMyNy41MzUwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script><script>(function(){var js = "window['__CF$cv$params']={r:'7f9afff08bfe07d2',t:'MTY5MjUzODMyNy41MzUwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script></body>
</html>
from merge-python-client.
@hwuiwon thanks for reporting this issue. Are there any additional details in the body of the 502 error?
from merge-python-client.
@hwuiwon is this an accurate description of the error:
You call chunks = self.client.filestorage.files.download_retrieve(id=file.id)
6 times with different file ids. Once you call it for the 7th time, you get a 502.
from merge-python-client.
@dsinghvi Yes you're correct
from merge-python-client.
@hwuiwon is it possible the issue is with the 7th file id you are using? can you try calling it with a single file id 10 times and checking if that works?
from merge-python-client.
Hi @hwuiwon, were you able to try out the suggestion from @dsinghvi? If that still continues to throw a 502 error, would you be able to provide us with your organization name, linked account id that is returning the 502, and the estimated time frame around when you received the 502?
These things should help us find the log for your request to try to figure out what the issue is!
from merge-python-client.
@dsinghvi @rmkonnur I just tried again and got this error. I believe that the file ID I used is valid because every time I run this function, it retrieves the up-to-date list of files from the merge endpoint.
The file id that's causing the problem is aed40b72-ab5f-477c-8062-9e55a2317bcc
.
For the context,
- Organization Name:
Prism AI
- Linked Account ID:
75df5146-686f-4d44-8dbc-53453e1487f0
Thanks!
2023-08-21 17:46:25.726 | INFO | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:46:25.727 | INFO | MergeService.py:250 download_file - file_id=60651a52-949a-4bee-baf4-cd4ddca3e41d, file_name=bain_report_2021-global-private-equity-report.pdf, in_bytes=True
2023-08-21 17:47:47.183 | INFO | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:47:47.184 | INFO | MergeService.py:250 download_file - file_id=dd95be7a-919a-45f6-b25a-9cd073b77ef7, file_name=2022-esg_report.pdf, in_bytes=True
2023-08-21 17:48:02.332 | INFO | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:48:02.332 | INFO | MergeService.py:250 download_file - file_id=ce94ef7b-7c7d-4c8d-8dda-147342c4235f, file_name=bain_report_global-private-equity-report-2023.pdf, in_bytes=True
2023-08-21 17:48:24.601 | INFO | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:48:24.602 | INFO | MergeService.py:250 download_file - file_id=69e4d7b9-7a29-40c0-bf45-5e72ecc6a220, file_name=2023-global-pe-report---roadshow-deck.pdf, in_bytes=True
2023-08-21 17:48:42.285 | INFO | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:48:42.285 | INFO | MergeService.py:250 download_file - file_id=404ab2ba-1ae5-4d4d-998e-67f3c34b5479, file_name=bain-report_the-working-future.pdf, in_bytes=True
2023-08-21 17:49:18.282 | INFO | DataPipelineServiceLocal.py:64 load_and_parse_files - Started loading and parsing files. account_token=[TRUNCATED]
2023-08-21 17:49:18.282 | INFO | MergeService.py:250 download_file - file_id=aed40b72-ab5f-477c-8062-9e55a2317bcc, file_name=bain_report_private_equity_report_2020.pdf, in_bytes=True
2023-08-21 17:49:23.330 | ERROR | IntegrationTask.py:50 initiate_file_processing - integration_request=public_token='string' organization_id='p_4fc4bae9_eddb_4950_b2b9_c7074e2678c1' organization_name='Prism AI' organization_admin_id='0b0bba07-e34b-4fe4-9e2b-d9a2f8533968', account_token=[TRUNCATED], error=status_code: 502, body: <!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>api.merge.dev | 502: Bad gateway</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/main.css" />
</head>
<body>
<div id="cf-wrapper">
<div id="cf-error-details" class="p-0">
<header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-8">
<h1 class="inline-block sm:block sm:mb-2 font-light text-60 lg:text-4xl text-black-dark leading-tight mr-2">
<span class="inline-block">Bad gateway</span>
<span class="code-label">Error code 502</span>
</h1>
<div>
Visit <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">cloudflare.com</a> for more information.
</div>
<div class="mt-3">2023-08-21 17:49:23 UTC</div>
</header>
<div class="my-8 bg-gradient-gray">
<div class="w-240 lg:w-full mx-auto">
<div class="clearfix md:px-8">
<div id="cf-browser-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
<div class="relative mb-10 md:m-0">
<span class="cf-icon-browser block md:hidden h-20 bg-center bg-no-repeat"></span>
<span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
</div>
<span class="md:block w-full truncate">You</span>
<h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
Browser
</h3>
<span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>
<div id="cf-cloudflare-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
<div class="relative mb-10 md:m-0">
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
<span class="cf-icon-cloud block md:hidden h-20 bg-center bg-no-repeat"></span>
<span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
</a>
</div>
<span class="md:block w-full truncate">Atlanta</span>
<h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" target="_blank" rel="noopener noreferrer">
Cloudflare
</a>
</h3>
<span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>
<div id="cf-host-status" class="cf-error-source relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
<div class="relative mb-10 md:m-0">
<span class="cf-icon-server block md:hidden h-20 bg-center bg-no-repeat"></span>
<span class="cf-icon-error w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
</div>
<span class="md:block w-full truncate">api.merge.dev</span>
<h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
Host
</h3>
<span class="leading-1.3 text-2xl text-red-error">Error</span>
</div>
</div>
</div>
</div>
<div class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="clearfix">
<div class="w-1/2 md:w-full float-left pr-6 md:pb-10 md:pr-0 leading-relaxed">
<h2 class="text-3xl font-normal leading-1.3 mb-4">What happened?</h2>
<p>The web server reported a bad gateway error.</p>
</div>
<div class="w-1/2 md:w-full float-left leading-relaxed">
<h2 class="text-3xl font-normal leading-1.3 mb-4">What can I do?</h2>
<p class="mb-6">Please try again in a few minutes.</p>
</div>
</div>
</div>
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7fa4b63e4a1e458e</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
Your IP:
<button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
<span class="hidden" id="cf-footer-ip">2610:148:1f00:0:c5ab:a697:6be:7f7a</span>
<span class="cf-footer-separator sm:hidden">•</span>
</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_502&utm_campaign=api.merge.dev" id="brand_link" target="_blank">Cloudflare</a></span>
</p>
<script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->
</div>
</div>
<script>(function(){var js = "window['__CF$cv$params']={r:'7fa4b63e4a1e458e',t:'MTY5MjY0MDE2My4xNjkwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script><script>(function(){var js = "window['__CF$cv$params']={r:'7fa4b63e4a1e458e',t:'MTY5MjY0MDE2My4xNzAwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/invisible.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script></body>
</html>
from merge-python-client.
Hi @hwuiwon, thanks for sending that information. Based on the error we're seeing, I would expect there to be something specific to that file thats causing the issue seeing as you're able to download other files of type PDF. Is there anything special about that file that could be considered different from the other PDFs you're downloading?
I tried downloading a few files from one of our test accounts (of all types) and have not been able to recreate your issue.
There are a couple options to move forward:
- Can you send over your SDK call code that you're running to generate the above output?
- Given permission, I could access your Merge account and attempt to use the SDK to download that file myself. I would recommend removing sensitive materials or linking a new account and including the a subset of the files you're trying to download.
- Reach out over intercom and we can continue debugging in a chat setting for faster turnaround time!
from merge-python-client.
Hi @rmkonnur ,
Below is the source code of MergeService.py
that I use to list and download the files.
import io
import time
import uuid
from typing import IO
from constants import MERGE_API_KEY, SUPPORTED_EXTENSIONS
from exceptions import PrismMergeException, PrismMergeExceptionCode
from loguru import logger
from merge.client import Merge
from merge.core.api_error import ApiError
from merge.resources.filestorage.types import (
AccountDetails,
CategoriesEnum,
File,
PaginatedFileList,
PaginatedFolderList,
SyncStatusStatusEnum,
)
class MergeService:
"""https://github.com/merge-api/merge-python-client"""
def __init__(self, account_token: str | None = None):
self.account_token = account_token
self.client = Merge(api_key=MERGE_API_KEY, account_token=account_token)
def generate_link_token(self, org_id: str, org_name: str, org_email: str) -> str:
logger.info("org_id={}, org_name={}, org_email={}", org_id, org_name, org_email)
try:
link_token_response = self.client.filestorage.link_token.create(
end_user_origin_id=str(uuid.uuid4()),
end_user_organization_name=org_name,
end_user_email_address=org_email,
categories=[CategoriesEnum.FILESTORAGE],
)
except Exception as e:
logger.error(
"org_id={}, org_name={}, org_email={}, error={}",
org_id,
org_name,
org_email,
str(e),
)
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_GENERATE_LINK_TOKEN,
message="Could not generate link token",
)
return link_token_response.link_token
def generate_account_token(self, public_token: str) -> str:
logger.info("public_token={}", public_token)
try:
account_token_response = self.client.filestorage.account_token.retrieve(
public_token=public_token
)
except Exception as e:
logger.error("public_token={}, error={}", public_token, str(e))
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_GENERATE_ACCOUNT_TOKEN,
message="Could not generate account token",
)
return account_token_response.account_token
def get_integration_provider(self) -> AccountDetails:
logger.info("account_token={}", self.account_token)
if not self.account_token:
logger.error("Account token can't be null")
raise PrismMergeException(
code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
message="Account token can't be null",
)
try:
integration_provider = self.client.filestorage.account_details.retrieve()
except Exception as e:
logger.error("account_token={}, error={}", self.account_token, str(e))
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_FETCH_INTEGRATION_DETAILS,
message="Could not get integration provider details",
)
return integration_provider
def get_account_owner(self) -> str:
logger.info("account_token={}", self.account_token)
if not self.account_token:
logger.error("Account token can't be null")
raise PrismMergeException(
code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
message="Account token can't be null",
)
try:
response = self.client.filestorage.users.list(is_me=True)
owner = response.results[0].email_address
except Exception as e:
logger.error("account_token={}, error={}", self.account_token, str(e))
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_FETCH_INTEGRATION_DETAILS,
message="Could not get integration provider details",
)
return owner
def check_sync_status(self) -> bool:
logger.info("account_token={}", self.account_token)
if not self.account_token:
logger.error("Account token can't be null")
raise PrismMergeException(
code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
message="Account token can't be null",
)
try:
sync_status = self.client.filestorage.sync_status.list(page_size=100)
results = sync_status.results
if all(r.status == SyncStatusStatusEnum.DONE for r in results):
return True
if any(r.status == SyncStatusStatusEnum.FAILED for r in results):
raise PrismMergeException(
code=PrismMergeExceptionCode.FAILED_TO_SYNC,
message="Failed to sync",
)
except Exception as e:
logger.error("account_token={}, error={}", self.account_token, str(e))
if isinstance(e, PrismMergeException):
raise
raise PrismMergeException(
code=PrismMergeExceptionCode.UNKNOWN,
message=str(e),
)
return False
def list_folders_in_folder(
self,
folder_id: str | None = None,
drive_id: str | None = None,
next: str | None = None,
) -> PaginatedFolderList:
if not self.account_token:
logger.error("Account token can't be null")
raise PrismMergeException(
code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
message="Account token can't be null",
)
if not folder_id and not drive_id:
logger.error(
"account_token={}, folder_id={}, drive_id={}, next={}, error={}",
self.account_token,
folder_id,
drive_id,
next,
"Either drive id or folder id is required",
)
raise PrismMergeException(
code=PrismMergeExceptionCode.REQUIRES_DRIVE_ID,
message="Either drive id or folder id is required",
)
logger.info(
"account_token={}, folder_id={}, drive_id={}, next={}",
self.account_token,
folder_id,
drive_id,
next,
)
try:
folder_list = self.client.filestorage.folders.list(
page_size=100, folder_id=folder_id, drive_id=drive_id, cursor=next
)
except Exception as e:
logger.error(
"account_token={}, folder_id={}, drive_id={}, next={}, error={}",
self.account_token,
folder_id,
drive_id,
next,
str(e),
)
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_LIST_FOLDERS,
message="Could not fetch folders",
)
return folder_list
def list_all_files(self, next: str | None = None) -> PaginatedFileList:
logger.info("account_token={}", self.account_token)
if not self.account_token:
logger.error("Account token can't be null")
raise PrismMergeException(
code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
message="Account token can't be null",
)
try:
file_list = self.client.filestorage.files.list(page_size=100, cursor=next)
except Exception as e:
logger.error(
"account_token={}, next={}, error={}", self.account_token, next, str(e)
)
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_LIST_FILES,
message="Could not fetch files",
)
return file_list
def generate_file_list(self) -> list[File]:
logger.info("account_token={}", self.account_token)
file_list: list[File] = []
response = self.list_all_files()
file_list.extend(response.results)
while response.next is not None:
try:
response = self.list_all_files(next=response.next)
file_list.extend(response.results)
except ApiError as e:
logger.info(
"account_token={}, error={}, Too many requests. Waiting for 1 min to resume..",
self.account_token,
e,
)
time.sleep(60)
return file_list
def download_file(
self, file: File, in_bytes: bool | None = False
) -> IO[bytes] | str:
logger.info(
"file_id={}, file_name={}, in_bytes={}", file.id, file.name, in_bytes
)
if not self.account_token:
logger.error("Account token can't be null")
raise PrismMergeException(
code=PrismMergeExceptionCode.INVALID_ACCOUNT_TOKEN,
message="Account token can't be null",
)
file_extension = file.name.split(".")[-1]
if file_extension not in SUPPORTED_EXTENSIONS:
logger.error("File type not supported: .{}", file_extension)
raise PrismMergeException(
code=PrismMergeExceptionCode.FILE_TYPE_NOT_SUPPORTED,
message="File type not supported",
)
try:
chunks = self.client.filestorage.files.download_retrieve(id=file.id)
except Exception as e:
logger.error(
"account_token={}, file_id={}, error={}",
self.account_token,
file.id,
str(e),
)
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_DOWNLOAD_FILE,
message="Could not download file",
)
if in_bytes:
response = b"".join(chunks)
return io.BytesIO(response)
tmp_uuid = str(uuid.uuid4())
with open(tmp_uuid, "wb") as f:
for chunk in chunks:
f.write(chunk)
return tmp_uuid
def remove_integration(self) -> None:
try:
self.client.filestorage.delete_account.delete()
except Exception as e:
logger.error("account_token={}, error={}", self.account_token, str(e))
raise PrismMergeException(
code=PrismMergeExceptionCode.COULD_NOT_DELETE_INTEGRATION,
message="Could not delete integration",
)
This is the load_and_parse()
function (part of the DataPipelineServiceLocal.py
)
def __init__(self, org_id: str, account_token: str):
self.org_id = org_id
self.account_token = account_token
self.merge_service = MergeService(account_token=account_token)
...
def load_data(self, all_files: list[File]) -> list:
logger.info("Started loading data. account_token={}", self.account_token)
all_items = [{"data": file} for file in all_files]
all_file_ids = [file.id for file in all_files]
loaded_docs = []
for file in all_items:
file_docs = self.load_and_parse_files(file)
loaded_docs.extend(file_docs)
logger.info("Finished loading data. account_token=", self.account_token)
return loaded_docs
def load_and_parse_files(
self, file_row: dict[str, File]
) -> list[dict[str, Document]]:
logger.info(
"Started loading and parsing files. account_token={}", self.account_token
)
documents = []
try:
file_in_bytes: IO[bytes] = self.merge_service.download_file(
file=file_row["data"], in_bytes=True
)
loaded_doc = self.loader.load_data(
file=file_in_bytes, split_documents=False
)
loaded_doc[0].doc_id = file_row["data"].id
loaded_doc[0].metadata = {
"file_id": file_row["data"].id,
"process_date": self.process_date,
}
documents.extend(loaded_doc)
except PrismException as e:
logger.error("file_row={}, error={}", file_row, e)
self.not_processed_file_ids.append(file_row["data"].id)
return [{"doc": doc} for doc in documents]
And this is how I call it
file_list = self.generate_file_list()
loaded_docs = self.load_data(all_files)
Feel free to access any file in this account. There are no sensitive informations.
Also where can I access the intercom? Thanks!
from merge-python-client.
@hwuiwon Since a consistent reproducible error was not found, we're going to close this issue. If you are still encountering issues feel free to reopen!
from merge-python-client.
Related Issues (16)
- How do you pass in your account token? HOT 1
- Filestorage files download casts/returns None instead of returning the downloaded bytes HOT 2
- Update `httpx` HOT 3
- client.filestorage.users.list(is_me=True) doesn't work HOT 4
- pydantic.error_wrappers.ValidationError: 6 validation errors for ParsingModel[PaginatedFileList] HOT 11
- Support for pydantic v2 HOT 4
- Typed integration names HOT 2
- Lexicographic order HOT 1
- Expand object not using value. HOT 3
- Soften pydantic upper version constraint HOT 5
- Upgrade to Pydantic 2.0 HOT 5
- Asyncio support in README HOT 1
- Python type checking return list base model HOT 1
- Latest version broken: Validation Error for File PermissionsRequest HOT 2
- Does merge.dev list enumerated values anywhere? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from merge-python-client.