Code Monkey home page Code Monkey logo

Comments (23)

ayanamists avatar ayanamists commented on May 28, 2024 1

I noticed an issue with the regex and fixed it. It is now:

/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$)(?!`)/g
    // Regex explanation:
    // (?<!`)                 # Negative lookbehind to ensure the '$' is not preceded by a backtick (`)
    // \$                     # Match a literal '$' character
    // (\d+(?:[.,]\d+)*)      # Capture group 1: Match one or more digits, optionally followed by a decimal part (e.g., 123.45)
    // (?=                    # Positive lookahead to ensure the following conditions are met:
    //   \s*[.,;!?]\s*\B      #   The number is followed by a punctuation mark (.,;!?) and a non-word boundary
    //   |                    #   OR
    //   \s+[a-zA-Z]          #   The number is followed by one or more whitespace characters and a letter
    //   |                    #   OR
    //   \s+\$                #   The number is followed by one or more whitespace characters and a '$' sign
    // )
    // (?!`)                  # Negative lookahead to ensure the '$' is not followed by a backtick (`)
    // /g                     # Global flag to replace all occurrences

Thanks for your explanation. I tried some examples:

function check(line) {
     console.log(line.replace(/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$)(?!`)/g, '\\$&'));
}

check('The price of xxx is $1')
check('The price of xxx is $1. You can buy it for $0.95 or lower')
check('例如,在表达式$\lambda . \lambda . 1$中,最内层的$1$是封闭的,因为它的索>引值$1$等于它在表达式中的深度$1$。同样,在表达式$\lambda . \lambda . 2$中,最内>层的$2$也是封闭的,因为它的索引值$2$等于它在表达式中的深度$2$')
check('$1 + 1 = 2$')

The output:

The price of xxx is $1
The price of xxx is \$1. You can buy it for \$0.95 or lower
例如,在表达式$lambda . lambda . 1$中,最内层的$1$是封闭的,因为它的索引值$1$等于它在表达式中的深度$1$。同样,在表达式$lambda . lambda . 2$中,最内层的$2$也是封闭的,因为它的索引值$2$等于它在表达式中的深度$2$
$1 + 1 = 2$

It seems your solution works fine in these examples.

from chatgpt-next-web.

H0llyW00dzZ avatar H0llyW00dzZ commented on May 28, 2024

I know this
related to this

from chatgpt-next-web.

daiaji avatar daiaji commented on May 28, 2024

I know this related to this

* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155)

* [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964)

* [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)

The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.

from chatgpt-next-web.

H0llyW00dzZ avatar H0llyW00dzZ commented on May 28, 2024

I know this related to this

* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155)

* [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964)

* [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)

The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.

This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.

from chatgpt-next-web.

daiaji avatar daiaji commented on May 28, 2024

I know this related to this

* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155)

* [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964)

* [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)

The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.

This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.

Regardless of how the code is encapsulated, it seems that there is no way to avoid using complex logic and regular expressions to address this issue. I conducted a brief search for Markdown rendering packages in Node.js, and it appears that almost all packages have given up on properly handling the rendering of the dollar sign. The maintainers seem to have chosen a rather passive approach of not addressing such rendering issues.

The issue might be the only valuable thing there; markdown-it doesn't support LaTeX at all, as for react-markdown, you know.

Frankly, if everyone continues to handle this issue with a negative attitude, it may eventually be left to LLM for maintenance.

from chatgpt-next-web.

H0llyW00dzZ avatar H0llyW00dzZ commented on May 28, 2024

I know this related to this

* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155)

* [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964)

* [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)

The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.

This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.

Regardless of how the code is encapsulated, it seems that there is no way to avoid using complex logic and regular expressions to address this issue. I conducted a brief search for Markdown rendering packages in Node.js, and it appears that almost all packages have given up on properly handling the rendering of the dollar sign. The maintainers seem to have chosen a rather passive approach of not addressing such rendering issues.

The issue might be the only valuable thing there; markdown-it doesn't support LaTeX at all, as for react-markdown, you know.

Frankly, if everyone continues to handle this issue with a negative attitude, it may eventually be left to LLM for maintenance.

I believe there's always a way to resolve this without resorting to complex logic and excessive use of regular expressions. It's just that I currently don't have the time to do it.

from chatgpt-next-web.

daiaji avatar daiaji commented on May 28, 2024

I know this related to this

* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155)

* [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964)

* [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)

The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.

This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.

Regardless of how the code is encapsulated, it seems that there is no way to avoid using complex logic and regular expressions to address this issue. I conducted a brief search for Markdown rendering packages in Node.js, and it appears that almost all packages have given up on properly handling the rendering of the dollar sign. The maintainers seem to have chosen a rather passive approach of not addressing such rendering issues.
The issue might be the only valuable thing there; markdown-it doesn't support LaTeX at all, as for react-markdown, you know.
Frankly, if everyone continues to handle this issue with a negative attitude, it may eventually be left to LLM for maintenance.

I believe there's always a way to resolve this without resorting to complex logic and excessive use of regular expressions. It's just that I currently don't have the time to do it.

I took a quick look at the example of markdown-to-jsx, and it seems that it requires writing LaTeX rendering conditions. This task seems a bit simpler compared to what we are currently working on, at least we don't have to replace dollar signs. However, the question is whether it's worth refactoring the code.

Honestly, if there are no existing solutions available, our choices might be limited.

from chatgpt-next-web.

ClConstantine avatar ClConstantine commented on May 28, 2024

I meet the problem too. Can we use the $ to announce here is a math syntax and use $ to annouce here is a price or something.

The `` of $ xxx can be add with LLM by using prompt.

Just some simple ideas.

from chatgpt-next-web.

ayanamists avatar ayanamists commented on May 28, 2024

I meet the problem too. Can we use the $ to announce here is a math syntax and use $ to annouce here is a price or something.

The `` of $ xxx can be add with LLM by using prompt.

Just some simple ideas.

Add to prompt may not be a good idea, as each time it will cost some tokens.

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

I found a solution: #4354
I tested it with different examples, and it worked.

from chatgpt-next-web.

ayanamists avatar ayanamists commented on May 28, 2024

I found a solution: #4354 I tested it with different examples, and it worked.

Would you like to share some insights in your PR? I cannot understand the complex regex used in your code

/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[a-zA-Z.,;!?]?\s*$|\s+[a-zA-Z]|\s+\$)(?!`)/g

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

I came up with it with the help of LLMs.

Here is an explanation:

  1. Ensure that the dollar sign ($) is not preceded by a backtick (`).
  2. Match the dollar sign ($).
  3. Match one or more digits (\d+).
  4. Optionally match a decimal separator (. or ,) followed by one or more digits ((?:[.,]\d+)*).
  5. Ensure that the matched dollar amount is followed by:
    • Either the end of the line ($), or
    • A non-word character (e.g., punctuation mark like ., ,, ;, !, ?) and then the end of the line, or
    • A word character (e.g., a letter) preceded by one or more whitespace characters, or
    • Another dollar sign ($) preceded by one or more whitespace characters.
  6. Ensure that the dollar sign is not followed by a backtick (`).
  7. The g flag at the end makes the regular expression global, meaning it will match all occurrences in the text.

I think it still can be improved upon.

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

I noticed an issue with the regex and fixed it. It is now:

/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$)(?!`)/g
    // Regex explanation:
    // (?<!`)                 # Negative lookbehind to ensure the '$' is not preceded by a backtick (`)
    // \$                     # Match a literal '$' character
    // (\d+(?:[.,]\d+)*)      # Capture group 1: Match one or more digits, optionally followed by a decimal part (e.g., 123.45)
    // (?=                    # Positive lookahead to ensure the following conditions are met:
    //   \s*[.,;!?]\s*\B      #   The number is followed by a punctuation mark (.,;!?) and a non-word boundary
    //   |                    #   OR
    //   \s+[a-zA-Z]          #   The number is followed by one or more whitespace characters and a letter
    //   |                    #   OR
    //   \s+\$                #   The number is followed by one or more whitespace characters and a '$' sign
    // )
    // (?!`)                  # Negative lookahead to ensure the '$' is not followed by a backtick (`)
    // /g                     # Global flag to replace all occurrences

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

It is now:

/(?<!`|\\)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$|$)(?!`)/g

I noticed the issues in your output and fixed them.
Update: fixing other scenarios and rare use cases.

/(?<!`|\\)\$(\d+(\w+)?(?:[.,]\d+(\w+)?)*)(?=\s*[.,;?]\s*\B|!?\s+[a-zA-Z]|!?\s+\$|!?\s*[-=+\/]\s*\$\b|$)(?!`)/g

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

Update: I went about it the wrong.
The new PR is the one to use, it has a better and short regex, covering all cases. #4363

/(?<!`|\\)\$\d+([,.](\d+[,.])?\d+)?(?!.*\$\B)(?!`)/g

from chatgpt-next-web.

Dean-YZG avatar Dean-YZG commented on May 28, 2024

得益于Algorithm5838 的贡献,目前该问题已解决
image

from chatgpt-next-web.

Issues-translate-bot avatar Issues-translate-bot commented on May 28, 2024

Bot detected the issue body's language is not English, translate it automatically.


Thanks to the contribution of Algorithm5838, this problem has been solved.
image

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

Unfortunately, there are still some issues:

  1. If you did not use the Inject System Prompt, the issues will persist, as the LLM might still use single dollar signs for inline LaTeX.
  2. Similarly, the same problem is present in block LaTeX, where if the double dollar signs are followed by a number, the LaTeX rendering would break.
    The first two issues are related because the dollar sign(s) is followed by a number.
  3. Another issue is that if the dollar sign and number are inside a code block or inline code, a backslash would be rendered incorrectly.

And here is a related issue: #4537

My workaround has fixed these three issues.

Current implementation:
Screenshot 2024-04-18 at 13 01 58

My workaround:
Screenshot 2024-04-18 at 13 00 26

You can try it yourself, here is the instance of my fork https://github.com/Algorithm5838/NextChat/tree/dollar-sign:
https://nextchat-git-dollar-sign-algorithm5838s-projects.vercel.app/

from chatgpt-next-web.

daiaji avatar daiaji commented on May 28, 2024

Perhaps this problem will never be solved.

from chatgpt-next-web.

Algorithm5838 avatar Algorithm5838 commented on May 28, 2024

@daiaji Did you try my workaround? If so, how did you find it?

from chatgpt-next-web.

daiaji avatar daiaji commented on May 28, 2024

Sorry, I just feel very frustrated.

As you can see, I submitted this PR. Honestly, even though GPT has provided a lot of help and it has taken up a significant amount of my time, it seems that the problem is still far from being solved.

That's all for now.😔

from chatgpt-next-web.

ayanamists avatar ayanamists commented on May 28, 2024

Sorry, I just feel very frustrated.

As you can see, I submitted this PR. Honestly, even though GPT has provided a lot of help and it has taken up a significant amount of my time, it seems that the problem is still far from being solved.

That's all for now.😔

I understand your frustration. It can be disheartening when you've put in a significant amount of time and effort into a pull request and the problem still remains unsolved.

In my opinion, this issue should definitely be addressed in the remark parser. The parser should correctly identify what is math and what is a US dollar symbol. Interestingly, I have never encountered such a problem when using pandoc (for converting and blogging, see My Blog Project). This is because pandoc uses a stronger rule for markdown math, as documented in pandoc's user guide:

Extension: tex_math_dollars
Anything between two $ characters will be treated as TeX math. The opening $ must have a non-space character immediately to its right, while the closing $ must have a non-space character immediately to its left, and must not be followed immediately by a digit. Thus, $20,000 and $30,000 won’t parse as math. If for some reason you need to enclose text in literal $ characters, backslash-escape them and they won’t be treated as math delimiters.

I have tested my inputs, and all of them are correctly handled by pandoc. Most of the time, the output of ChatGPT follows this guideline. So I'd like to figure out why remark don't use this rule.

from chatgpt-next-web.

H0llyW00dzZ avatar H0llyW00dzZ commented on May 28, 2024

Sorry, I just feel very frustrated.
As you can see, I submitted this PR. Honestly, even though GPT has provided a lot of help and it has taken up a significant amount of my time, it seems that the problem is still far from being solved.
That's all for now.😔

I understand your frustration. It can be disheartening when you've put in a significant amount of time and effort into a pull request and the problem still remains unsolved.

In my opinion, this issue should definitely be addressed in the remark parser. The parser should correctly identify what is math and what is a US dollar symbol. Interestingly, I have never encountered such a problem when using pandoc (for converting and blogging, see My Blog Project). This is because pandoc uses a stronger rule for markdown math, as documented in pandoc's user guide:

Extension: tex_math_dollars
Anything between two $ characters will be treated as TeX math. The opening $ must have a non-space character immediately to its right, while the closing $ must have a non-space character immediately to its left, and must not be followed immediately by a digit. Thus, $20,000 and $30,000 won’t parse as math. If for some reason you need to enclose text in literal $ characters, backslash-escape them and they won’t be treated as math delimiters.

I have tested my inputs, and all of them are correctly handled by pandoc. Most of the time, the output of ChatGPT follows this guideline. So I'd like to figure out why remark don't use this rule.

It's not possible to fix anyway related to LaTeX because the module conflicts with the front-end CSS and UI/UX.

from chatgpt-next-web.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.