Code Monkey home page Code Monkey logo

hocr-tools's People

Contributors

amitdo avatar codacy-badger avatar dependabot[bot] avatar edsu avatar hellocatfood avatar jbreiden avatar jbreiden2 avatar jronallo avatar kba avatar lanjkn avatar skylord123 avatar smijo149 avatar stweil avatar tmbdev avatar tmbnv avatar zuphilip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hocr-tools's Issues

Release v1.0.1

We fixed some bugs, did a lot of cleanup and started some other work. Thus I think we should create a new release soon. Here is my draft:


Fixed bugs

  • hocr-split: Duplicate content in <html> #58
  • hocr-pdf: ocr_line does not have to be a span (e.g. also a div is possible) #57
  • hocr-pdf: empty rawtext caused index error #57
  • hocr-check: Fix containment checks and metadata checks, add tests #52 #61 #62

Ongoing work

  • Check handling of non ASCII characters in hOCR files #53
  • Make hocr-tools fit for Python 3 #37

See details: v1.0.0...master


Any comments or approvings are appreciated.

More options for hocr-wordfreq

We discussed more options for hocr-wordfreq:

  1. An option for splitting on spaces only, which will then also words containing punctations. This is actually what is used for tesseract and therefore there is a use case for this as well.
  2. An option for undo the hyphens at the line ends. This also needs to delete the newline symbols before counting the frequencies. Moreover, possible blank lines should also be deleted.

Release management

For one, having users use specific versions makes debugging easier.

The tools could be uploaded to PyPI, so users can install it with pip install hocr-tools, or included in distros like Debian.

Possible course of action:

  • Release a 0.2 version soon (i.e. tag a git commit v0.2) to have a starting point
  • Consider reorganizing the module (issue #42)
  • Make the tools compatible with PyPI
  • Try to adhere to semantic versioning

The CLI of the tools has not changed or at least not much over the last years. However, this could (and should) change in the future, possibly breaking backwards compatibility if it cannot be avoided.

Trouble merging with hocr-pdf

I'm having issues merging hocr & jpeg files into a searchable PDF. I generated the hocr with ocropy, hopefully that is correct. I used the run-test as an example workflow. Neither the run-test output or my source will generate a working PDF.

I attempt to merge with hocr-pdf . > test.pdf but only get an image in the PDF. Preview & the PDF.js tool in Firefox have no searchable content. Are there better tools to see what is happening?

I am using hocr-tools 1.1.1 & Python 2.7.12 on OS X 10.9.5.
I also tried hocr-tools on a debian system & got the same result.

Are there any sample files to try merging to see if it can work?

My test directory looks like…
ls temp-output
temp.hocr
temp.jpg
test.pdf

Here is a zip of that directory incase testing helps.
https://www.dropbox.com/s/j6lpx2kgfao6143/temp-output.zip

Make hocr-tools a proper module

The README currently states:

Each command line program is self contained; if you have Python 2.7 with the required packages installed, it should just work. (Unfortunately, that means some code duplication; we may revisit this issue in later revisions.)

I would like to revisit this issue πŸ˜„

The advantages of striving to make the programs self-contained is that there is no need to install the whole project to run an individual script, provided the requirements were installed by some other means (e.g. apt-get). For simple scripts like hocr-check this is really neat.

The disadvantages of self-contained commands are IMHO:

  • Code redundancy (assoc, get_text etc.). These are small functions but it's considerable boilerplate and keeping them consistent is a hassle. This also makes it hard to spot that e.g. get_text has not been needed for a while.
  • Embedding resources in the source code, such as the invisible font in hocr-pdf, makes it hard to add changes.
  • It makes it harder to keep consistent interfaces. Some commands use optparse, others parse CLI arguments themselves, some read from STDIN on no args, some show the help page on no args, some exit with an error etc. A shared hocrlib module could help reduce boilerplate, though a consistent use of one of argparse could also remedy this situation.

In summary, I would argue for an approach with a shared library, resources in the file system and require users to properly (setup.py) install the tools.

What do you think?

In particular, is anyone relying on the scripts being self-contained?

UnboundLocalError: local variable 'rawtext' referenced before assignment

With some hocr files created by tesseract I get the following error when using hocr-pdf. Here's an example hocr file that causes this error:
https://drive.google.com/file/d/0ByUq6R632zOwU2ZrOTVMT0ZtdG8/view?usp=sharing

raceback (most recent call last):
  File "/usr/bin/hocr-pdf", line 5, in <module>
    pkg_resources.run_script('hocr-tools==0.1', 'hocr-pdf')
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1462, in run_script
    exec_(script_code, namespace, namespace)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 41, in exec_
    exec("""exec code in globs, locs""")
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/hocr_tools-0.1-py2.7.egg/EGG-INFO/scripts/hocr-pdf", line 137, in <module>

  File "/usr/lib/python2.7/site-packages/hocr_tools-0.1-py2.7.egg/EGG-INFO/scripts/hocr-pdf", line 53, in export_pdf

  File "/usr/lib/python2.7/site-packages/hocr_tools-0.1-py2.7.egg/EGG-INFO/scripts/hocr-pdf", line 86, in add_text_layer

UnboundLocalError: local variable 'rawtext' referenced before assignment

If rawtext is set to an empty string at some point before line 86, then the script completes creating a searchable PDF--yes, a rather poorly OCR'd one, but a PDF nevertheless.

Error while using hocr-pdf file

While using the below command i m getting error related to character
help out please

hocr-pdf . > out.pdf
Traceback (most recent call last):
  File "C:\Python36\Scripts\hocr-pdf.py", line 143, in <module>
    export_pdf(args.imgdir, 300)
  File "C:\Python36\Scripts\hocr-pdf.py", line 70, in export_pdf
    pdf.save()
  File "c:\python36\lib\site-packages\reportlab\pdfgen\canvas.py", line 1237, in save
    self._doc.SaveToFile(self._filename, self)
  File "c:\python36\lib\site-packages\reportlab\pdfbase\pdfdoc.py", line 224, in SaveToFile
    f.write(data)
  File "C:\Python36\Scripts\hocr-pdf.py", line 47, in write
    sys.stdout.write(data)
  File "c:\python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 11-14: character maps to <undefined>

hocr-pdf could recalculate word positions for resized image

If I want to make a PDF from an image that is exactly the same dimensions as were used during OCR, then hocr-pdf can do that. But if I want my PDFs to be smaller in file size then one way is to use images that are resized smaller than were used for OCR. Currently using an image that is a different size from that used during OCR puts the words in the wrong place. As long as the aspect ratio of the image is maintained even if it is a different size it ought to be possible to recalculate where to place words in the PDF so that they show up in the correct location.

Is this feature of interest? Is this an issue anyone else has?

Extract HOCR from searchable PDF

Thank you so much with your great works!

But I wonder if it is possible to extract HOCR from searchable PDF, I mean, PDFs that are already combined with HOCR, I haven't find any tools to do that for me...

TODOs from hocr-check

These are commented as FIXME at the end of hocr-check, I'll put them here for discussion.

  • containment of paragraphs, columns, etc.
  • ocr-recognized vs. actual tags
  • warn about text outside ocr_ elements
  • check title= attribute format
  • check that only the right attributes are present on the right elements
  • check for unrecognized ocr_ elements
  • check for significant overlaps
  • check that image files are not repeated

Keep this in check with hocr-spec (cross-reference maybe) and consider creating an XSD schema for use in ocr-fileformats (though these tend to be inflexible).

hocr-lines outputs byte strings in python3

E.g. ./hocr-lines test/testdata/sample.html

b'1 Down the Rabbit-Hole'
b'Alice was beginning to get very tired of sitting by her sister on the bank,'
b'and of having nothing to do: once or twice she had peeped into the book her'
b'sister was reading, but it had no pictures or conversations in it, `and what is'
b"the use of a book,' thought Alice `without pictures or conversation?"

hocr-clean

Go through all ocr-elements and delete empty elements and possibly also elements with spaces only. Either do this recursive or start with the top elements and look at the textContent.

All tools should have a -h/--help option

Not all the tools support a help flag. We should add this as a baseline so users can get at least minimal usage info on the command line.

Tools without -h/--help (c.f. smoke.tsht):

  • hocr-check
  • hocr-combine
  • hocr-extract-images
  • hocr-lines
  • hocr-merge-dc

Plus possibly those that do not run at the moment because of PyXML/lxml #9

hocr-pdf output failed

Thank you for update of hocr-pdf.

I try to convert image to searchable pdf by using hocr-pdf and my gcv2hocr.
hocr-pdf sometimes fails to convert pdf.

Attached scan0002.jpg and scan0002.hocr can convert to pdf.

But attached scan0003.jpg and scan0003.hocr fails to convert to pdf.
The error messages are below.

Traceback (most recent call last):
File "hocr-pdf", line 139, in
export_pdf(sys.argv[1], 300)
File "hocr-pdf", line 64, in export_pdf
add_text_layer(pdf, image, height, dpi)
File "hocr-pdf", line 73, in add_text_layer
hocr = etree.parse(hocrfile, html.XHTMLParser())
File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:85131)
File "src/lxml/parser.pxi", line 1782, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:124005)
File "src/lxml/parser.pxi", line 1808, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:124374)
File "src/lxml/parser.pxi", line 1712, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:123169)
File "src/lxml/parser.pxi", line 1115, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:117533)
File "src/lxml/parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:110510
)
File "src/lxml/parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:112276)
File "src/lxml/parser.pxi", line 613, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:111124)
lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 469, column 234

I can not distinguish the problem comes from hocr-pdf or my gcv2hocr.
If you find problems in my gcv2hocr, please notify me.

scan0002.hocr.txt
scan0002.jpg
scan0003.jpg
scan0003.hocr.txt

Make hocr-tools fit for Python 3

We should consider making the tools ready for Python 3. Not so much because 2.7 is going away soon (it will be officially developed until 2020) but because all the problems 2to3 reports are easily fixable, e.g. throwing strings is bad practice in all versions.

The changes to the standard libaries (e.g. cStringIO is now io.StringIO) make it tedious to use the same code for both versions (and it's generally not recommended). Instead, I propose that we fix the code problems 2to3 reports and ensure (by testing in CI) that the code can be automatically ported to Python 3 and run the tests. At some point in the future, we can then just run 2to3 for good.

hocr-pdf: issue with search and copy/paste in macOS Preview.app

The Preview.app is the default PDF reader for macOS.
When using hocr-pdf to generate a PDF file, from an image + hocr file, the generated PDF works well for search, and copy/paste in Acrobat, PDF.js and others, but not Preview. You can't search in Preview, though you can select text and copy/paste to another document, but are just blank characters.

Anyone knows of a specific reason for this to happen?

3 unittests failed

(venv) ubuntu@tesseract-ocr:~/hocr-tools$ ./test/tsht
# Testing ./hocr-check/test-hocr-check.tsht
1..11
ok 1 - Check from filename
ok 2 - Check from stdin
# ./ancestor: valid examples
ok 3 - 'hocr-check ./ancestor/ok-par.html' (failed: 0)
ok 4 - 'hocr-check ./ancestor/ok-line.html' (failed: 0)
ok 5 - 'hocr-check ./ancestor/ok-carea.html' (failed: 0)
# ./ancestor: invalid examples
ok 3 - 'hocr-check ./ancestor/notok-line.html' (failed: 1)
ok 4 - 'hocr-check ./ancestor/notok-carea.html' (failed: 1)
ok 5 - 'hocr-check ./ancestor/notok-par.html' (failed: 1)
# ./meta: valid examples
ok 3 - 'hocr-check ./meta/ok-system.html' (failed: 0)
# ./meta: invalid examples
ok 3 - 'hocr-check ./meta/notok-typo.html' (failed: 1)
ok 4 - 'hocr-check ./meta/notok-system.html' (failed: 1)
# Testing ./hocr-combine/test-hocr-combine.tsht
1..2
ok 1 - Executed: hocr-combine ../testdata/sample.html ../testdata/sample.html
ok 2 - check whether number ocr_lines in self-combined result is doubled
# Testing ./hocr-eval-geom/hocr-eval-geom.tsht
1..3
ok 1 - Executed: hocr-eval-geom ../testdata/sample.html ../testdata/sample.html
ok 2 - Executed: hocr-eval-geom -e ocr_line -o 0.05 -c 0.88 ../testdata/tess.hocr ../testdata/sample.html
ok 3 - Matches '\(0, 0, 0.0, ': '(0, 0, 0.0, 37) (0, 0, 0.0, 37)'
# Testing ./hocr-eval/hocr-eval.tsht
1..2
ok 1 - Executed: hocr-eval ../testdata/sample.html ../testdata/sample.html
not ok 2 - Failed: hocr-eval -d -v ../testdata/tess.hocr ../testdata/sample.html
---
diag: |
overlap 52041 true_bbox (470, 528, 1383, 585)
1 Down the Rabbit-Hole
1 Down the Rabbit-Hole
overlap 85330 true_bbox (464, 651, 2074, 704)
Alice was beginning to get very tired of sitting by her sister on the bank,
Alice was beginning to get very tired of sitting by her sister on the bank,
overlap 83824 true_bbox (464, 711, 2076, 763)
and of having nothing to do: once or twice she had peeped into the book her
and of having nothing to do: once or twice she had peeped into the book her
overlap 80600 true_bbox (463, 773, 2075, 823)
Traceback (most recent call last):
File "/usr/local/bin/hocr-eval", line 4, in <module>
__import__('pkg_resources').run_script('hocr-tools==1.2.0', 'hocr-eval')
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/hocr_tools-1.2.0-py2.7.egg/EGG-INFO/scripts/hocr-eval", line 227, in <module>

UnicodeEncodeError: 'ascii' codec can't encode character u'β€˜' in position 67: ordinal not in range(128)
...

# Testing ./hocr-eval-lines/hocr-eval-lines.tsht
1..4
ok 1 - Executed: hocr-eval-lines -v ../testdata/sample.txt ../testdata/sample.html
ok 2 - Matches 'ocr_errors 7': 'segmentation_errors 0<LF>ocr_errors 7'
ok 3 - Matches 'segmentation_errors 0': 'segmentation_errors 0<LF>ocr_errors 7'
ok 4 - Not like '\('segmentation_errors'': 'string'
# Testing ./hocr-extract-images/test-hocr-extract-images.tsht
1..10
# ocr_page argument
ok 1 - Executed: hocr-extract-images -p page-%03d.png -b ../testdata -e ocr_page ../testdata/tess.hocr
ok 2 - ocr_page: number of images == 1
ok 3 - ocr_page: number of texts == 1
# ocr_page stdin
ok 4 - ocr_page: number of images == 1
ok 5 - ocr_page: number of texts == 1
# ocr_line argument
ok 6 - Executed: hocr-extract-images -p line-%03d.png -b ../testdata -e ocr_line ../testdata/tess.hocr
ok 7 - ocr_line: number of images == 37
ok 8 - ocr_line: number of texts == 37
# ocr_line stdin
ok 9 - ocr_line: number of images == 37
ok 10 - ocr_line: number of texts == 37
# ocrx_word argument
ok 11 - Executed: hocr-extract-images -p word-%03d.png -b ../testdata -e ocrx_word ../testdata/tess.hocr
ok 12 - ocrx_word: number of images == 503
ok 13 - ocrx_word: number of texts == 503
# ocrx_word stdin
ok 14 - ocrx_word: number of images == 503
ok 15 - ocrx_word: number of texts == 503
ok 16 - Indeed 503 words in sample
# Testing ./hocr-lines/hocr-lines.tsht
1..3
Traceback (most recent call last):
  File "/usr/local/bin/hocr-lines", line 4, in <module>
    __import__('pkg_resources').run_script('hocr-tools==1.2.0', 'hocr-lines')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/hocr_tools-1.2.0-py2.7.egg/EGG-INFO/scripts/hocr-lines", line 22, in <module>

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 67: ordinal not in range(128)
not ok 1 - hocr-lines
not ok 2 - ./tess.lines ('37' != '3')
ok 3 - check first line
after
# Running function after for hocr-lines.tsht
# Testing ./hocr-merge-dc/hocr-merge-dc.tsht
1..4
ok 1 - Command succeeded
ok 2 - Matches 'name='DC.title' content='Alice im Wonderland'': '<?xml version="1.0" encoding="UTF-8"?><LF><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" <LF>    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <LF><html xmlns="http://www.'
ok 3 - Not like 'name='DC.title' content='UKOLN'': 'string'
ok 4 - Matches 'name="DC.title" content="UKOLN"': '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><LF><?xml version="1.0" encoding="UTF-8"??><LF><html xmlns="http://www.w3.org/1'
# Testing ./hocr-pdf/test-hocr-pdf.tsht
ok 1 - Executed: wget --quiet http://digi.bib.uni-mannheim.de/fileadmin/digi/445442158/tess/445442158_0126.hocr
ok 2 - Executed: wget --quiet http://digi.bib.uni-mannheim.de/fileadmin/digi/445442158/max/445442158_0126.jpg
ok 3 - Not empty file: 445442158_0126.pdf
ok 4 - Executed: pdfgrep tribunali 445442158_0126.pdf
1..4
# Testing ./hocr-split/test-hocr-split.tsht
1..4
ok 1 - Executed: hocr-split test.hocr test-%003d.hocr
ok 2 - two files were produced
ok 3 - one page in test-001.hocr
ok 4 - one page in test-002.hocr
ok 5 - one xml:lang= only, #58
ok 6 - one xmlns= only, #58
# Testing ./hocr-wordfreq/hocr-wordfreq.tsht
1..4
ok 1 - Executed: hocr-wordfreq ../testdata/sample.html
ok 2 - Executed: hocr-wordfreq -i -n 30 ../testdata/sample.html
ok 3 - Matches '23\s*the': '23          the'
ok 4 - Matches '24\s*the': '24          the'
# Testing ./smoke.tsht
ok 1 - Executed: hocr-check --help
ok 2 - Executed: hocr-check -h
ok 3 - Executed: hocr-combine --help
ok 4 - Executed: hocr-combine -h
ok 5 - Executed: hocr-eval --help
ok 6 - Executed: hocr-eval -h
ok 7 - Executed: hocr-eval-geom --help
ok 8 - Executed: hocr-eval-geom -h
ok 9 - Executed: hocr-eval-lines --help
ok 10 - Executed: hocr-eval-lines -h
ok 11 - Executed: hocr-extract-g1000 --help
ok 12 - Executed: hocr-extract-g1000 -h
ok 13 - Executed: hocr-extract-images --help
ok 14 - Executed: hocr-extract-images -h
ok 15 - Executed: hocr-lines --help
ok 16 - Executed: hocr-lines -h
ok 17 - Executed: hocr-merge-dc --help
ok 18 - Executed: hocr-merge-dc -h
ok 19 - Executed: hocr-pdf --help
ok 20 - Executed: hocr-pdf -h
ok 21 - Executed: hocr-split --help
ok 22 - Executed: hocr-split -h
1..22
# Failed 3 tests

Switch from PyXML to BeautifulSoup

PyXML hasn't been updated in 10+ years. Would you consider moving these tools over to something like BeautifulSoup which is on PyPI and easier to install?

Here's what hocr-lines looks like using BeautifulSoup:

#!/usr/bin/env python

import re
import sys
from bs4 import BeautifulSoup

if len(sys.argv)>1:
    stream = open(sys.argv[1])
else:
    stream = sys.stdin

soup = BeautifulSoup(stream)
for line in soup.select('.ocr_line'):
    print re.sub(r'\s+', ' ', line.text)

Improve error handling in hocr-pdf

I would like to be able to do following calls (with expected output):

  • hocr-pdf -h --> Help text
  • hocr-pdf --help --> Help text
  • hocr-pdf filename.hocr --> Wrong argument ...
  • hocr-pdf filename.jpg --> Wrong argument ...
  • hocr-pdf filename1.hocr filename2.jpg --> Wrong argument ...

Does hocr-pdf support Japanese ?

I try to make a searchable pdf by hocr-pdf.
I made gcv2hocr to make hocr from Google Cloud Vision OCR output.
https://github.com/dinosauria123/gcv2hocr
English image conversion seems to be good, I try to Japanese image conversion.

I found hocr-pdf output of pdf using Japanese included hocr, text position is much displaced.
I can't separate the problem comes from my gcv2hocr or hocr-pdf, I want to ask your opinion.

I know baseline number is important for text position in pdf file.
Could you tell me what is another important parameter for text position in pdf file ?

<span class='ocr_line' id='line_1_1' title="bbox 96 79 127 144 ; baseline 0 -10; x_size 89; x_descenders 20; x_ascenders 21"><span class='ocrx_word' id='word_1_1' title='bbox 96 79 127 144 ; x_wconf 85' lang='jpn' dir='ltr'> ε…‰ε­¦ </span>

jptest.hocr.txt
jptest2.pdf
jptest2

HTML exporter

The hocr files are already html files and can be displayed in any browser. However, they will just display the text without any layout or format information. What do you think about doing some HTML exporter which will display also some of the layout or format information? With the bbox we can show the text at the correct position, see also ocropus-archive/DUP-ocropy#80 (comment)

Duplicate content in <html>

Using hocr-split, the produced files contains duplicate content:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" xml:lang="en">

xmlns and xml:lang attributes are duplicated.
Then W3 validator produce errors -- see https://validator.w3.org/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.