txt2html — Text to HTML converter
txt2html ‘s original website was at http://www.aigeek.com/txt2html/
Table of Contents
What is txt2html ?
txt2html is a Perl program that converts plain text to HTML. It uses the HTML::TextToHTML perl module to do so. Look at the README file.
What is HTML::TextToHTML ?
HTML::TextToHTML is a Perl module that converts plain text to HTML.
It supports headings, lists, some tables, simple character markup, and hyperlinking, and is highly customizable. It recognizes some of the apparent structure of the source document (mostly whitespace and typographic layout), and attempts to mark that structure explicitly using HTML.
Our intent in writing this tool is to provide an easier way of converting existing text documents to HTML format.
What is txt2html not?
txt2html is not a program to convert wordprocessor files or other marked-up document formats. It is also not a program to convert HTML to text. Most HTML browsers do that.
If you need to convert something other than plain text to HTML, or you need to convert from HTML, you should look for a more appropriate tool.
txt2html is not a program for automatically generating a table-of-contents from a file. If you want that, then use txt2html to generate a HTML file, and then use htmltoc or hypertoc on the HTML file.
What’s the current version?
The current version is v2.01. A list of changes can be found here and in the tar file.
Obtaining and installing txt2html
It’s a Perl module with the script and everything bundled in the tarball, so installation is pretty easy. You don’t need to compile anything.
- Get the current version of txt2html/HTML::TextToHTML at sourceforge download txt2html
- Untar the tarball somewhere.
- Follow the instructions in the INSTALL file. Look out for the dependencies!
This will install the script in /usr/local/bin by default, but note that the perl install stuff is clever enough to figure out where your copy of perl is installed, and alter the script accordingly. It will also install the standard links dictionary in /usr/share/txt2html .
If you want override the standard links dictionary, then add your own version to
/.txt2html.dict so it will be used automatically.
If you prefer an RPM install, look for various RPMs, made by other people. There is also an official Debian package, though naturally all RPMs and Debs will lag behind the official tarball here.
How to Use txt2html
What can txt2html handle?
Look at this sample document for a basic idea. txt2html is more flexible than it looks though. You don’t need to make your documents look just like this for txt2html to work.
Hyperlinks
If you just want to make obvious URL references into hyperlinks, you just have to install the standard links dictionary.
For learning how to configure your own hyperlinks dictionary, take a look at the sample links dictionary. If you have improvements for this file, please mail them to me so everyone can benefit.
How is txt2html licensed?
Look at the LICENSE or the plain text version. Basically this is licensed under the Artistic License and the GPL.
What platforms will it run on?
If you can get a copy of Perl (version 5 or later), then you can use txt2html . Don’t ask me how, because I’ve never tried it. Other people tell me it works fine.
What is the future of txt2html?
- TODO List
- txt2html mailing list for discussion, feature requests, bug reports, chatting with other users, etc. (low volume)
Thanks
Thanks to all the people who have given us ideas, patches, bug reports, wish lists, and moral support.
Txt to html linux
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
txt2html is a program to convert plain text files to the hypertext markup language (HTML).
- Clean and compliant HTML5
- Option to include your own CSS
- HTML entity replacement (optional; you can also edit the entity list). Note that critical entities ( ,»,&) are always replaced, so generally you don’t have to enable this option
- Detects and fixes _italic_ and *bold* (optional)
- Detects and marks URLs (optional)
- Optional paragraph joining. Attempts to merge hard-coded line-breaks into coherent paragraphs. Lines shorter than specified length, which don’t end with characters marking end of line (‘. «) are joined.
- Fast (v2 is more than 50x faster than v1!)
- Supports converting multiple files at once
- Drag’n’drop
- Both graphical and command-line interface
- Free and open source
To install, just extract all files from the archive to any folder and double-click on txt2html.exe to start using the program.
License and home page
txt2html is licensed under Mozilla Public License
Home page has an older (non-.NET) version of txt2html, which you can use on Windows Vista and other older versions of Windows
converting text file to html file with python
I have a text file that contains :
I wrote this code to convert the text file to html :
but the problem that I had in html file that in each line there is no space between the two columns:
what should I do to have the same content and the two columns like in text file
5 Answers 5
Just change your code to include tags to ensure that your text stays formatted the way you have formatted it in your original text file.
This is HTML — use BeautifulSoup
That is because HTML parsers collapse all whitespace. There are two ways you could do it (well probably many more).
One would be to flag it as «preformatted text» by putting it in tags.
The other would be a table (and this is what a table is made for):
Fairly tedious to type out by hand, but easy to generate from your script. Something like this should work:
You can use a standalone template library like mako or jinja . Here is an example with jinja:
If you can’t install jinja , then here is an alternative:
I have added title, looping here line by line and appending each line on and tags, it is should work as single table without column. No need to use these tags( and [gave a spaces for readability]) for col1 and col2.
log: snippet:
2019/08/19 19:59:25 MUTHUKUMAR_TIME_DATE,line: 118 INFO | Logger object created for: MUTHUKUMAR_APP_USER_SIGNUP_LOG 2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 48 INFO | ***** User SIGNUP page start ***** 2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 49 INFO | Enter first name: [Alphabet character only allowed, minimum 3 character to maximum 20 chracter]
html2text(1) — Linux man page
html2text — an advanced HTML-to-text converter
Synopsis
html2text -help
html2text -version
html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] [ -rcfile path ] [ -style ( compact | pretty ) ] [ -width width ] [ -o output-file ] [ -nobs ] [ -ascii ] [ input-url . ]
Description
html2text reads HTML documents from the input-urls, formats each of them into a stream of plain text characters, and writes the result to standard output (or into output-file, if the -o command line option is used).
Documents that are specified by a URL (RFC 1738) that begins with «http:» are retrieved with the Hypertext Transfer Protocol (RFC 1945). URLs that begin with «file:» and URLs that do not contain a colon specify local files. All other URLs are invalid.
If no input-urls are specified on the command line, html2text reads from standard input. A dash as the input-url is an alternate way to specify standard input.
html2text understands all HTML 3.2 constructs, but can render only part of them due to the limitations of the text output format. However, the program attempts to provide good substitutes for the elements it cannot render. html2text parses HTML 4 input, too, but not always as successful as other HTML processors. It also accepts syntactically incorrect input, and attempts to interpret it «reasonably».
The way html2text formats the HTML documents is controlled by formatting properties read from an RC file. html2text attempts to read $HOME/.html2textrc (or the file specified by the -rcfile command line option); if that file cannot be read, html2text attempts to read /etc/html2textrc. If no RC file can be read (or if the RC file does not override all formatting properties), then «reasonable» defaults are assumed. The RC file format is described in the html2textrc(5) manual page.
Options
By default, html2text uses ISO 8859-1 for the output. Specifying this option, plain ASCII is used instead. To find out how non-ASCII characters are rendered, refer to the file «ascii.substitutes».
This option is for diagnostic purposes: The HTML document is only parsed and not processed otherwise. In this mode of operation, html2text will report on parse errors and scan errors, which it does not in other modes of operation. Note that parse and scan errors are not fatal for html2text, but may cause mis-interpretation of the HTML code and/or portions of the document being swallowed. -debug-parser Let html2text report on the tokens being shifted, rules being applied, etc., while scanning the HTML document. This option is for diagnostic purposes. -debug-scanner Let html2text report on each lexical token scanned, while scanning the HTML document. This option is for diagnostic purposes. -help
Print command line summary and exit.
By default, html2text renders underlined letters with sequences like «underscore-backspace-character» and boldface letters like «character-backspace-character», which works fine when the output is piped into more(1), less(1), or similar. For other applications, or when redirecting the output into a file, it may be desirable not to render character attributes with such backspace sequences, which can be accomplished with this command line option. -o output-file Write the output to output-file instead of standard output. A dash as the output-file is an alternate way to specify the standard output. -rcfile path Attempt to read the file specified in path as RC file. -style ( compact | pretty ) Style pretty changes some of the default values of the formatting parameters documented in html2textrc(5). To find out which and how the formatting parameter defaults are changed, check the file «pretty.style». If this option is omitted, style compact is assumed as default. -unparse This option is for diagnostic purposes: Instead of formatting the parsed document, generate HTML code, that is guaranteed to be syntactically correct. If html2text has problems parsing a syntactically incorrect HTML document, this option may help you to understand what html2text thinks that the original HTML code means. -version Print program version and exit. -width width By default, html2text formats the HTML documents for a screen width of 79 characters. If redirecting the output into a file, or if your terminal has a width other than 80 characters, or if you just want to get an idea how html2text deals with large tables and different terminal widths, you may want to specify a different width.
Files
/etc/html2textrc System wide parser configuration file. $HOME/.html2textrc Personal parser configuration file, overrides the system wide values.
Conforming To
HTML 3.2 (HTML 3.2 Reference Specification — http://www.w3.org/TR/REC-html32),
RFC 1945 (Hypertext Transfer Protocol — HTTP).
Restrictions
html2text provides only a basic implementation of the Hypertext Transfer Protocol (HTTP). It requires the complete and exactly matching URL to be given as argument and will not follow redirections (HTTP 301/ 307).
html2text was written to convert HTML 3.2 documents. When using it with HTML 4 or even XHTML 1 documents, some constructs present only in these HTML versions might not be rendered.
Author
html2text was written up to version 1.2.2 by Arno Unkrig for GMRS Software GmbH, Unterschleissheim.