alphanome-ai / sec-parser Goto Github PK

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.

Home Page: https://parser.alphanome.app

License: MIT License

Python 99.79% HTML 0.21%

10-k 10-q edgar-scraper filings finance financial-data financial-filings html-parser sec-edgar nlp

sec-parser's People

Contributors

Stargazers

Watchers

sec-parser's Issues

Incorrect mock variable name

Pull Request
#62

`part2item6` common issue - exhibits with tables

Related to alphanome-ai/sec-ai#47

AssertionError: Missing: ['part2item6'] is raised in generalization tests if exhibits appear with a table at the same level as top-level section title.

This is a common issue in:

FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_RTX_0000101829-23-000032] - AssertionError: Missing: ['part2item6']
FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_TMO_0000097745-23-000059] - AssertionError: Missing: ['part2item6']
FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_UPS_0001090727-23-000038] - AssertionError: Missing: ['part2item6']
FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_MCD_0000063908-23-000076] - AssertionError: Missing: ['part2item6']
FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_MDT_0001613103-23-000128] - AssertionError: Missing: ['part2item6']
FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_GM_0001467858-23-000098] - AssertionError: Missing: ['part2item6'], Unexpected: ['part1item1']
FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_KHC_0001637459-23-000114] - AssertionError: Missing: ['part2item6']

Almost all of them (RTX, TMO, MDT, KHC, GM) have the following pattern:

<div ...>

    <span ...>
    Item 6. Exhibits
    </span>

    <table ...>
    ...
    </table>

</div>

UPS with small difference:

<div ...>

    <span ...>
    Item 6. 
    </span>
    <span ...>
    Exhibits
    </span>

    <table ...>
    ...
    </table>

</div>

MCD has major difference where the title "Item 6. Exhibits" is written inside the table

<div>
    <table>
        <tr>
        <tr/>

        <tr>
            <td ...>
                <div ...>        
                    <span ...>
                        Item 6. Exhibits
                    </span>
                </div>
            </td>
        </tr>

        ...

        <tr>
        <tr/>
        <tr>
        <tr/>
    </table>
</div>

Screenshots for two major cases:

Table at the same level as top-level section title

Top-level section title written inside table:

Instructions to add API Key & Developer setup

Need instructions for setting up sec-api.io API key before running basic examples.
Instructions for developer installation

Singular Visual Line Should Be Identified as a Single TextElement

Problem

For MSFT 0000950170-23-014423, the top section title "PART I. FINANCIAL INFORMATION " is identified as two semantic elements:
{
"cls_name": "TopSectionTitle",
"level": 0,
"section_type": "part1",
"text_content": "PART I. FINANCI"
},
{
"cls_name": "TitleElement",
"level": 0,
"text_content": "AL INFORMATION"
}

This should be:
{ "cls_name": "TopSectionTitle",
"level": 0,
"section_type": "part1",
"text_content": "PART I. FINANCIAL INFORMATION"
}

Ideas about a possible solution

Adjust text element merger to keep merging elements until a new visual line.

Implement parsing accuracy end-to-end tests

Objective

Introduce an end-to-end test mechanism that allows freezing the expected parsing results, facilitating manual review and approval of changes in parsing behavior.

Tasks

Freezing Mechanism: Implement a way to "lock" or "freeze" the current expected parsing output in a YAML document. This will be based on the syntax available in the parsing_plugins test modules.
Output Review: Add a pre-commit check to enable manual review. This check will block the commit if there's a mismatch between parsing-result.txt and parsing-frozen.txt (names are tentative).
Hashing HTML: For readability and storage efficiency, hash the HTML content for each expected semantic element.

Show trailing text during `tree.render()`

I was having a look at the output of tree.render() and noticed that only the starting text is being printed as shown below.

├── TitleElement: Macroeconomic Conditions
│   ├── TextElement: Macroeconomic conditions, including inflation, cha ...

Does the core logic always make sure that the entire text body is captured if the starting text is right? If not, I would suggest printing out the trailing text also as shown below:

├── TitleElement: Macroeconomic Conditions
│   ├── TextElement: Macroeconomic conditions, including inflation, cha ... y’s results of operations and financial condition.

Would work easily by replacing

sec-parser/sec_parser/semantic_tree/semantic_tree.py

Line 56 in 0e034ad

title = f"{title[:max_line_length]}..."

with

title = f"{title[:max_line_length]} ... {title[-max_line_length:]}"

Request for Feedback: Architectural Design Proposal for Standardized Parsing of SEC EDGAR Tables

After careful consideration of the challenges presented, I've developed a proposal I'm eager to present. However, before we delve into the specifics, let's take a moment to review the data landscape we're navigating.

Exploring the Data Landscape

Let's take a look at a curated selection representing the variance in structure and content across different 10-Q filings:

10-Q/CAT/0000018230-03-000208

10-Q/BEN/0000038777-22-000138

10-Q/BSX/0001072613-08-001558

10-Q/CBOE/0001558370-20-012101

10Q/AAPL/0000320193_23_000077

Tree-oriented representation

Let's take a few examples of the design:

10-Q/CAT/0000018230-03-000208

from typing import Dict

table_element: TableElement = ...
tree: Dict = table_element.parse()
assert tree['2007']['Long-Term Debt']['Machinery and Engines'].value == 275
assert tree['2007']['Long-Term Debt']['Machinery and Engines'].scale == 'millions'
assert tree['2007']['Long-Term Debt']['Machinery and Engines'].unit == '$'

10-Q/CBOE/0001558370-20-012101

assert tree['Percentage of Total Revenues'] \
           ['Three Months Ended, September 30']['2020']['Operating Expenses'].value == 32.2
assert tree['Percentage of Total Revenues'] \
           ['Three Months Ended, September 30']['2020']['Operating Expenses'].unit == "%"

assert tree['Three Months Ended, September 30']['2020']['Operating Expenses'].value == 12.6
assert tree['Three Months Ended, September 30']['2020']['Operating Expenses'].scale == "millions"
assert tree['Three Months Ended, September 30']['2020']['Operating Expenses'].unit == "$"

Tabular-oriented representation

Year	Period End Date	Period Description	Category	Subcategory	Value	Scale	Unit
2007	-	-	Long-Term Debt	Machinery and Engines	275	millions	$
2020	September 30	Three Months Ended	Operating Expenses	-	12.6	millions	$
2020	September 30	Three Months Ended	Percentage of Total Revenues	Operating Expenses	32.2	-	%

Parsing process

The parsing process could be organized as HTML -> Tabular -> Dict. This would involve parsing the HTML directly to a table, such as with pandas.read_html() and then converting it to a python dictionary.

The other way would be HTML -> Dict -> Tabular, this would involve traversing the HTML tags (the DOM Tree) directly with a tool like BeautifulSoup4, and then constructing the Table from the constructed tree (Python dictionary).

Your turn

We're excited to share our proposal for a new standardized table parsing method using a tree structure, designed to streamline the representation of data from SEC EDGAR reports. We'd greatly appreciate your professional insights to refine this approach. Please share your thoughts!

Raise Code Unit Test Coverage to 90-100%

Current state

Code coverage is at 72%.

Rationale

Increases robustness against regressions.
Increases community trust and encourages more contributions.

Goal

Raise it to ≥90-100%.

Important: Please ensure that tests are meaningful and cover edge cases, rather than just inflating the coverage percentage.

Resources

Codecov.io

ModuleNotFoundError: No module named 'sec_downloader'

The import statement for sec_downloader in the "How to use" section will not work because instructions for installing sec_downloader is not provided. This results in

ModuleNotFoundError: No module named 'sec_downloader'

Two ways to handle this:

Either add pip install sec_downloader in the README file's "Getting Started" section.

(or)

Add a line with sec-downloader = "^0.2.3" in pyproject.toml's [tool.poetry.dependencies] instead of just adding it in dev dependencies. This will make sure that the import statement will work with just pip install sec-ai

Fix the TopSectionTitle being split in MSFT filing

Context

MSFT accuracy-test (permalink at the time of posting)

Problem

Titles come out as two separate title elements

        {
            "text_content": "PART I. FINANCI"
        },
        {
            "text_content": "AL INFORMATION"
        },

This is because MSFT puts the section titles into two pieces for some reason

Ideas about a possible solution

Maybe include the line information into the solution: If two elements of the same type (and level) are on the same line, they should probably be identified as a single element

small change in doc

change "blue" in For example, we can get a percentage of blue text: to "green".

`dl.get_latest_html(...)` has hard-coded values

It always returns 10-Q docs for AAPL. Fixed and sent PR to it's repo here: Elijas/sec-downloader#2

Adjusting Top Section Title Regular Expression to Handle Accented Characters

Problem

The regular expression employed in the top section manager for 10q needs modification, specifically to eliminate accented characters from both the regular expression itself and the input being matched.

Ideas about a possible solution

Removal of comma characters (accent characters) from additional_title used in the regular expression.

The additional_title before:

additional_title =  ("Unregistered Sales of Equity Securities, Use of Proceeds, and "
                    "Issuer Purchases of Equity Securities")

and after:

additional_title =  ("Unregistered Sales of Equity Securities Use of Proceeds and "
                    "Issuer Purchases of Equity Securities")

Page headers should be identified as PageHeaderElement

Context

MSFT accuracy-test (permalink at the time of posting)

Problem

The header "PART I" is identified as top section title element, when it should be identified as page header element. And because of this, the actual top section title element is incorrectly identified as title element

Ideas about a possible solution

One possible solution: Identify page header elements first, and then the top section title element will start working correctly, as it will not be confused by the header elements

Manually find bugs in various 10-Q documents and create Github issues

Objective

To enhance sec-parser by manually reviewing 10-Q documents from key companies and addressing issues.

Process once the Github issues are created:

Issue List: Manually review 10-Qs from selected companies and list parsing issues.
Triage: Categorize issues as Critical, High, Medium, or Low impact.
Plan: Develop a plan to fix these issues.
Fix: Start implementing fixes based on priority.

Demo application overlays black font on black background

See the red areas

Caching for .parse_latest

As noted in the readme.md

The parser utilizes caching, so multiple calls to retrieve the same data will not consume your API calls limit.

After running the code several times, I found that the API calls quota is being consumed. I noticed that the .parse_latest() currently doesn't seem to have any caching yet or I am missing something. Can someone help confirming this?

Thanks

Add "Open in" badge(s) to two jupyter notebooks

As discussed in this thread, we are looking to enhance the accessibility of our guide notebooks by adding an "Open in SageMaker Studio Lab" badge. This badge will allow users to directly open and run the notebooks in a cloud environment, improving the user experience.

Tasks for Contributors:

Update the links in the following badges to point to the guide notebooks (see links below).
Verify that clicking the updated link opens the notebook in the cloud environment.
Add the updated badge to the following notebooks, positioning it after the existing badges:
- Developer Guide Notebook
- User Guide Notebook
Ensure that the badge functions correctly when integrated into the notebook.
Create a GitHub Pull Request with your changes.

Special thanks to @mahimairaja for the excellent idea! 🙌 🙌

This issue is ideal for new contributors, offering a great opportunity to make a significant impact on our project. We appreciate your contributions!

Welcome to sec-parser! Start Here for Contributing

Contribution Workflow

We're excited about your interest in contributing to Alphanome AI's projects! To ensure a smooth and efficient process for all contributors, we've established this workflow. Please follow these steps to contribute effectively and avoid overlapping efforts.

Step 1: Select a Task

Option A: Explore Open Issues:
- Check out our Request For Contributions board for tasks that are ready for contributions.
- Alternatively, browse through the GitHub Issues page of a specific project, such as sec-parser Issues or sec-ai Issues.
- Tips:
  - Look for tasks labeled contributions-welcome. These tasks align with the project goals.
  - If you're new to the project, look for tasks labeled good-first-issue.
  - Be sure to check if a task is already tagged in-progress to avoid duplicate efforts.
Option B: Propose a New Task:
- Go through our Short-Term Roadmap to understand our focus areas and upcoming projects.
- If you discover an issue or have a novel idea, feel free to propose it. Initiate a conversation either in the Discussions forum or on our Discord server.

Step 2: Prepare for Contribution

Read CONTRIBUTING.md:
- Before you begin, read the CONTRIBUTING.md file of the project for guidelines on setup, coding standards, and codebase understanding.
Fork the Project:
- Fork the project on GitHub to create your own workspace.
Communicate Your Plan:
- We recommend commenting on the issue you're tackling to discuss your approach and seek guidance. This also allows us to tag the issue as in-progress.
Continuously Sync Your Fork:
- Follow this GitHub Guide to synchronize your fork with the main repository.

Step 3: Begin Your Contribution

Submit a Pull Request:
- Create a pull request with your changes, clearly explaining your contributions.
Check for Errors:
- Run our automated checks and your local tests to catch and fix any issues before final submission.

We're grateful for your contributions and look forward to your valuable input in our project!

Seeking Assistance and Asking Questions

If you have any questions, or concerns, or need further clarification, feel free to reach out. Please use our Discussions page for more detailed queries and Discord for quick, conversational questions. For questions specific to a GitHub issue or pull request, kindly post them directly in the respective issue or PR thread.

Note
For Maintainers: Content above is taken from our Common Contributing Guide. If any updates or changes are made here, please ensure they are also reflected in the original guide for consistency.

Refactor top section manager for 10 q

Splitting main function into smaller functions.

Unit, snapshot and accuracy test results (before and after code refactoring) included in refactor-unit-snapshot-accuracy-tests.txt

Pull Request - #72

Additional Instructions to add API Key & Developer setup

Following @INF800 addition towards the readme.md in issue #10, I personally think it will be better if we add .env tutorial to set up the SECAPIO_API_KEY and add these lines of code in the readme.md file.

from dotenv import load_dotenv
import os

env_path = ".env"
load_dotenv(dotenv_path=env_path)
api_key = os.environ.get('SECAPIO_API_KEY')

By the way, I just noticed that the changes merged regarding the previous issue is r verted eby this commit 6e2d1a8

Create a visualisation tool that overlays parsed elements with semi-transparent boxes

Background

In the example below, a Semantic Segmentation model consolidates multiple individual pixels into a single, coherent Semantic Element within an image.

Similarly, sec-parser features a Semantic Segmentation algorithm that consolidates multiple HTML tags into a single, coherent Semantic Element within a page.

Task

We'd like to have a new python function that has one input (a list of semantic elements) and one output (a string of HTML source code that has all the semi-transparent overlays applied), for example:

The coloring should be based on the type of the semantic element (just an example code to clearly convey the idea).

colored_html = ""
for element in parsed_elements:
    if isinstance(element, TextElement): 
        html = element.get_source_code()
        colored_html = colored_html + add_color('yellow', html)
    ...

Parsing into a SemanticTree is most likely not needed, you can just use the list of semantic elements.

Notes

You can retrieve the HTML to be modified directly from the semantic elements themselves. Therefore, the input of your function could be just ingesting a list of semantic elements (or a semantic tree).
It can be either part of the debug_dashboard or it can be a completely separate tool in the dev_utils folder.

Create a few 'Good First Issue' GitHub tasks for new contributors

Create a few starter tasks on GitHub and label them as 'Good First Issue' to help guide new contributors.

The task snapshot-verify fails

Error
FileNotFoundError: [Errno 2] No such file or directory: '/home/deenaawny/issue-66/sec-parser-test-data/10-Q/AA/0001193125-18-236766/expected-semantic-elements-list.json'
task: Failed to run task "snapshot-verify": exit status 1

Solution
comment accession number in selected-filings.yaml

#- 0001193125-18-236766 # 10-Q AA Alcoa Corp 2018-06-30

Pull Request
To do

Is there a parser similar to Edgar10QParser() for 10-Ks and 8-Ks

Currently this library allows for parsing 10-Qs like this:

elements: list = sp.Edgar10QParser().parse(html)
tree: sp.SemanticTree = sp.TreeBuilder().build(elements)

Is there something like this for 10-K, 8-Ks?

Dependency issue in`notebooks /developer_guide.ipynb`

The notebook will not run on third party jupyter server like colab or binder because jupyter_black and sec_parser are not installed.

Need to add the below hidden cell:

!pip install jupyter-black sec-parser

Write missing unit tests

Philosophy

Working with Complex Data: Unit Testing Approach

When dealing with complex data, a common and effective strategy is to encapsulate the complexity within a unit test. This approach involves defining the various scenarios you anticipate and then focusing on testing these scenarios rather than the entire document or using extensive debugging tools.
This method significantly reduces the time required to verify if your modifications are working as expected. Here's how you can do it:

1. Isolate the complexity: Identify the complex part of your data and isolate it as a unit test. This could be a function, a class, or any other component that you find complex.

2. Define the scenarios: Determine what you want to happen for different inputs or states of your program. These scenarios will form the basis of your unit tests.

3. Work with the unit test: Once you have your unit test set up, you can make changes and run the test to see if your changes are working as expected. This is much quicker and more efficient than working with the full document or using full debugging tools.

Remember, the goal here is to make your testing process more efficient and manageable. By isolating complexity and focusing on unit tests, you can achieve this goal and ensure your changes work as intended.

source

Technical details

The goal is to make the command task c finish successfully with 100% unit test coverage (it's currently at 98%)

Note
It's just a shortcut for task pre-commit-checks, which itself include task unit-tests among other things, and the task unit-tests is a shortcut to calling pytest tests/unit/.

Some helpful tips:

You can get more detailed output when passing -vv to the pytest command. You can do it with

task c -- --v

It's convenient for the tests to be re-run automatically, right after you save any file, you can do it with

task monitor-unit-tests

or (as described previously)

task monitor-unit-tests -- --v

Install some IDE extensions that can highlight covered and not covered lines by unit tests. For Cursor (or VSCode) users I recommend Coverage Gutters which will work out of the box with no additional configuration. Enable it as described here.
Consider using the following GPT-4 prompt to get you the first draft and continue from there

Create a pytest test suite for a given functionality in my codebase. The tests should be structured according to the Arrange-Act-Assert (AAA) pattern for clarity and maintainability. Include comments within each test to clearly delineate the Arrange, Act, and Assert stages. Include tests for both normal and edge cases. Provide descriptive test function names. All code output must be full and complete to be pasted, don't leave it to me to finish it off (you can exclude imports though if you want). Use test tables (pytest parameterized) where appropriate. Here is my code:
{{PASTE CODE HERE}}

Parsers for other type of docs

Hi, I'm impressed with your work. I'm expecting this work will enable us to extract valuable insights from SEC filings. Really appreciate it.

But I'm wondering if you are planning to add more parsers for other types of documents, such as 10-K, 8-K, etc.
It would be very useful if we have more parsers.

Thanks.

Can't enter MSFT/0000950170-23-014423 to dashboard app

Entering MSFT/0000950170-23-014423 does not produce the expected result (it reverts back to the selected sample tickers and doesn't return the Microsoft filing metadata)

Annotate more filings and add them to the accuracy test datasets

Annotate filings from these links

Extracting Text Elements Using sec_parser

I am currently using sec_parser to parse SEC filings, and I've encountered issues while trying to extract specific text elements such as TopSectionTitle, TitleElement, and TextElement. The output includes ANSI escape codes which seem to interfere with straightforward text extraction.

Is there any subfunction to do this?

FAILED tests/generalization/processing_steps/test_top_level_section_title_classifier.py::test_top_level_section_title_classifier[10-Q_MMM_0000066740-23-000058] - AssertionError: Missing: ['part2item3']

Related to alphanome-ai/sec-ai#47

Currently, TextElement can become HighlightedTextElement only if 80% of the text content is bold with some font weight. This is not an ideal scenario in cases such as part2item3 of 10-Q_MMM_0000066740-23-000058 which looks like:

where ~55% of text content is bold with some font weight. The style_string default dict for it looks like:

Because of this TextElement cannot become HighlightedTextElement which in turn cannot become TitleElement which in turn cannot become TopLevelSectionElement.

Solution:

Change PERCENTAGE_THRESHOLD from 80 to a value less than 54.9.

sec-parser/sec_parser/semantic_elements/highlighted_text_element.py

Line 60 in 58304f0

PERCENTAGE_THRESHOLD = 80

In order to find a sweet spot I checked if there were more fails like these where the threshold needs to be decreased, but could not find any in the dataset. So, I think the best value as of now would be 50 percent assuming neither the trailing description or the bold heading has more text content.

SupplementaryText with repeating text is being identified as IrrelevantElement

Context

MSFT accuracy-test (permalink at the time of posting)

Can I get 8K - 10K report?

Can I get data of 8K-10K reports?
If possible, how can I do it?

Make HighlightedTextClassifier work with `<b>` tags

Discussed in https://github.com/orgs/alphanome-ai/discussions/56

^{Originally posted by Elijas November 24, 2023}

Example document

https://www.sec.gov/Archives/edgar/data/1675149/000119312518236766/d828236d10q.htm

 <p style="margin-top:9pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman">
  Options to purchase 1 million shares of common stock at a weighted average exercise price of $36.28 were
outstanding as of June 30, 2017, but were not included in the computation of diluted EPS because they were anti-dilutive, as the exercise prices of the options were greater than the average market price of Alcoa Corporation�s common stock.
 </p>
 <p style="margin-top:13pt; margin-bottom:0pt; font-size:10pt; font-family:Times New Roman">
  <b>
   G. Accumulated Other Comprehensive Loss
  </b>
 </p>
 <p style="margin-top:6pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman">
  The following table details the activity of the three components that comprise Accumulated other comprehensive loss for both Alcoa
Corporation�s shareholders and Noncontrolling interest:
 </p>

Goal

The "G. Accumulated Other Comprehensive Loss" should be recognized as HighlightedTextElement (and therefore, TitleElement).

Most likely, you will have to get a percentage of text that is covered inside the <b> tag, by reusing the parts implemented in the HighlightedTextElement. This will help you avoid situations where text text text <b>bold</b> text text is recognized as higlighted

Download documents directly from SEC EDGAR instead of using sec-api.io API

We're currently using sec-api.io

To remove title pages;
To find separation points between top-level sections;

The goal of this issue is to implement logic for identifying the separation points between the title page and the first root section, and between different root sections in SEC EDGAR HTML documents.

Implement logic to identify the separation point between the title page and the first root section.
Implement logic to identify separation points between root sections.
The logic should be robust, able to handle edge cases.
Confirm the implementation's accuracy across a large dataset using 3rd party APIs.

Implement "Open in GitHub Codespaces" button in our README

Example

source

We'll probably need a new repo for that, e.g. repository "sec-parser-examples" with roughly the structure of https://github.com/ricklamers/funcchain-demo
Let's make it so it contains only the examples in the README.md, as any more advanced examples probably belong in the documentation

Write the two sec-parser Welcome Guides (for users and contributors)

Description

We need to create two one-pagers that provide a comprehensive introduction to sec-parser for newcomers. This document should serve as a quick guide to understanding the project, its structure, algorithm design, domain context, and contribution guidelines.

Parse Page Numbers and Page Separators

It would probably be best to create new classes such as PageElement which would be inherited from IrrelevantElement to allow for an easy removal of all such elements in one swoop
Include unit tests with a variety of real-life samples. Use pytest test tables. Include the source document ticker and accession number in the test name.

Table not identified

https://www.sec.gov/Archives/edgar/data/1675149/000119312518236766/d828236d10q.htm

https://discord.com/channels/1164249739836018698/1164622878147346552/1173250016668823605

https://parser.alphanome.app/?p=view_parsed&view_parsed=semantic_elements&q=1675149%2F000119312518236766

alphanome-ai / sec-parser Goto Github PK

sec-parser's People

Contributors

Stargazers

Watchers

Forkers

sec-parser's Issues

Problem

Ideas about a possible solution

Objective

Tasks

Exploring the Data Landscape

10-Q/CAT/0000018230-03-000208

10-Q/BEN/0000038777-22-000138

10-Q/BSX/0001072613-08-001558

10-Q/CBOE/0001558370-20-012101

10Q/AAPL/0000320193_23_000077

Tree-oriented representation

10-Q/CAT/0000018230-03-000208

10-Q/CBOE/0001558370-20-012101

Tabular-oriented representation

Parsing process

Your turn

Current state

Rationale

Goal

Resources

Context

Problem

Ideas about a possible solution

Problem

Ideas about a possible solution

Context

Problem

Ideas about a possible solution

Objective

Process once the Github issues are created:

Tasks for Contributors:

Contribution Workflow

Step 1: Select a Task

Step 2: Prepare for Contribution

Step 3: Begin Your Contribution

Seeking Assistance and Asking Questions

Background

Task

Notes

Philosophy

Working with Complex Data: Unit Testing Approach

Technical details

Some helpful tips:

Solution:

Context

Discussed in https://github.com/orgs/alphanome-ai/discussions/56

Example document

Goal

Description

Recommend Projects

Recommend Topics

Recommend Org