rss2maildir.git
2 years agoPatch from MJ Ray for items without link master
Brett Parker [Tue, 27 Feb 2018 10:21:46 +0000 (10:21 +0000)]
Patch from MJ Ray for items without link

  - link might not always exist, md5sum is always present, don't fail on no link.

7 years agoMake sure that we feed the parser unicode data
Brett Parker [Sun, 25 Aug 2013 09:26:16 +0000 (10:26 +0100)]
Make sure that we feed the parser unicode data

    - Check if we get a unicode string back from feedparser
    - If not, re-encode so that it is
    - Remove explicit casting to utf-8 before it goes to the parser

7 years agoMore utf-8 handling for images
Brett Parker [Tue, 30 Apr 2013 20:17:40 +0000 (21:17 +0100)]
More utf-8 handling for images

    - iff we get a string object rather than a unicode one, decode it from utf-8

7 years agoStop trying to decode image tags to utf-8
Brett Parker [Mon, 11 Feb 2013 09:41:39 +0000 (09:41 +0000)]
Stop trying to decode image tags to utf-8

    - If the title/url contains a utf-8 character and we try to decode it it
      will fail due to not being present in the ascii set. Feedparser has already
      made sure that everything is utf-8 before we get it.

9 years agoChange header encoding for From/To address to make sure they're utf-8 and so they...
Brett Parker [Sun, 2 Oct 2011 18:28:45 +0000 (19:28 +0100)]
Change header encoding for From/To address to make sure they're utf-8 and so they are not invalidly encoded later.

Bug reported by Andre Klärner with pointers to what was going wrong - many thanks!

9 years ago* Add https support (thanks to Andre Klärner)
Brett Parker [Sat, 23 Jul 2011 12:06:47 +0000 (13:06 +0100)]
* Add https support (thanks to Andre Klärner)

9 years agoUpdate so that we don't get a warning in python 2.6 and above, where the md5 module...
Brett Parker [Sat, 5 Mar 2011 18:44:07 +0000 (18:44 +0000)]
Update so that we don't get a warning in python 2.6 and above, where the md5 module has been depricated

11 years agoFix silly typo
Brett Parker [Mon, 14 Sep 2009 11:47:14 +0000 (12:47 +0100)]
Fix silly typo

11 years agoAdd item fetched date header (X-rss2maildir-rundate)
Brett Parker [Mon, 14 Sep 2009 11:15:04 +0000 (12:15 +0100)]
Add item fetched date header (X-rss2maildir-rundate)

11 years agoguid might not always exist, link is always present, don't fail on no guid.
Brett Parker [Fri, 12 Jun 2009 10:00:33 +0000 (11:00 +0100)]
guid might not always exist, link is always present, don't fail on no guid.

11 years agoFix bug when link/guid contains characters not in ascii by encoding the keys as utf-8
Brett Parker [Fri, 12 Jun 2009 09:55:18 +0000 (10:55 +0100)]
Fix bug when link/guid contains characters not in ascii by encoding the keys as utf-8

11 years agoSmall fix to title handling code to deal with unicode better
Brett Parker [Tue, 17 Mar 2009 12:08:18 +0000 (12:08 +0000)]
Small fix to title handling code to deal with unicode better

12 years agoFix for items that actually have no content.
Brett Parker [Sat, 7 Jun 2008 11:59:47 +0000 (12:59 +0100)]
Fix for items that actually have no content.

12 years agoFix typo in previous charref fix
Brett Parker [Thu, 17 Apr 2008 07:20:55 +0000 (08:20 +0100)]
Fix typo in previous charref fix

12 years agoFix bug in character reference handling code
Brett Parker [Wed, 16 Apr 2008 22:45:07 +0000 (23:45 +0100)]
Fix bug in character reference handling code

12 years agoFix for title parsing
Brett Parker [Wed, 5 Mar 2008 10:05:16 +0000 (10:05 +0000)]
Fix for title parsing

12 years agoFix some entity handling
Brett Parker [Mon, 3 Mar 2008 15:08:11 +0000 (15:08 +0000)]
Fix some entity handling

    * fixes handling of numeric entities
    * fixes unittest for entities.

12 years agoAnother blockquote fix
Brett Parker [Mon, 3 Mar 2008 13:51:15 +0000 (13:51 +0000)]
Another blockquote fix

12 years agoFix blockquote support
Brett Parker [Mon, 3 Mar 2008 13:28:26 +0000 (13:28 +0000)]
Fix blockquote support

12 years agoFix issue with images having the same alt value but different urls
Brett Parker [Mon, 3 Mar 2008 11:58:08 +0000 (11:58 +0000)]
Fix issue with images having the same alt value but different urls

12 years agoUpdate changelog
Brett Parker [Sun, 2 Mar 2008 21:22:25 +0000 (21:22 +0000)]
Update changelog

12 years agoMore entities
Brett Parker [Sun, 2 Mar 2008 19:41:50 +0000 (19:41 +0000)]
More entities

12 years agosmall fix to put images on seperate lines
Brett Parker [Sun, 2 Mar 2008 19:17:37 +0000 (19:17 +0000)]
small fix to put images on seperate lines

12 years agoSmall fixes to list handling code
Brett Parker [Sun, 2 Mar 2008 19:02:23 +0000 (19:02 +0000)]
Small fixes to list handling code

12 years agosimple typo fix
Brett Parker [Sun, 2 Mar 2008 15:29:33 +0000 (15:29 +0000)]
simple typo fix

12 years agoUpdate TODO list
Brett Parker [Sun, 2 Mar 2008 13:21:48 +0000 (13:21 +0000)]
Update TODO list

12 years agoEntity handling fixes
Brett Parker [Sun, 2 Mar 2008 12:27:13 +0000 (12:27 +0000)]
Entity handling fixes

* Make entities case sensitive
* Add unittest for simple check of entities
* Add escaping of subject line

12 years agoAdd (lots) more basic HTML entities.
Brett Parker [Sun, 2 Mar 2008 12:11:25 +0000 (12:11 +0000)]
Add (lots) more basic HTML entities.

12 years agofix silly regression on pre formatting
Brett Parker [Sun, 2 Mar 2008 01:12:39 +0000 (01:12 +0000)]
fix silly regression on pre formatting

12 years agochange images to ReST format
Brett Parker [Sat, 1 Mar 2008 22:16:46 +0000 (22:16 +0000)]
change images to ReST format

12 years agoNormalise spaces where they should be.
Brett Parker [Sat, 1 Mar 2008 20:57:10 +0000 (20:57 +0000)]
Normalise spaces where they should be.

12 years agoUnicode handling of URLs fix
Brett Parker [Fri, 25 Jan 2008 08:31:38 +0000 (08:31 +0000)]
Unicode handling of URLs fix

12 years agoMore unicode fixes
Brett Parker [Wed, 16 Jan 2008 21:40:43 +0000 (21:40 +0000)]
More unicode fixes

12 years ago* Fix bad check on state directory
Brett Parker [Sun, 13 Jan 2008 21:47:27 +0000 (21:47 +0000)]
* Fix bad check on state directory

12 years ago* Begin fixes to list handling code - there's 2 unittests that are failing due
Brett Parker [Sun, 13 Jan 2008 21:02:24 +0000 (21:02 +0000)]
* Begin fixes to list handling code - there's 2 unittests that are failing due
  to this. (Previous revision had 7 unittests fail - bother)

12 years ago* Small Unicode fix for img tags.
Brett Parker [Sun, 13 Jan 2008 16:12:16 +0000 (16:12 +0000)]
* Small Unicode fix for img tags.

12 years ago* Change all entity refs in to unicode strings
Brett Parker [Sat, 12 Jan 2008 17:08:03 +0000 (17:08 +0000)]
* Change all entity refs in to unicode strings
* Update <br> handling to be more effective
* Ignore unknown tags and just pretend they're part of the flow
* Add <img> support (very basic!)

12 years agoUpdate TODO list
Brett Parker [Thu, 10 Jan 2008 20:27:31 +0000 (20:27 +0000)]
Update TODO list

12 years agoRudimentary <a href="...">bleep</a> support.
Brett Parker [Thu, 10 Jan 2008 20:12:52 +0000 (20:12 +0000)]
Rudimentary <a href="...">bleep</a> support.

12 years ago* Handle unicode data more effectively.
Brett Parker [Thu, 10 Jan 2008 18:23:17 +0000 (18:23 +0000)]
* Handle unicode data more effectively.

12 years agoFix typo/thinko in handle_startendtag
Brett Parker [Thu, 10 Jan 2008 18:08:23 +0000 (18:08 +0000)]
Fix typo/thinko in handle_startendtag

12 years ago* Update TODO list
Brett Parker [Mon, 7 Jan 2008 01:03:37 +0000 (01:03 +0000)]
* Update TODO list

12 years ago* Update list handling code to deal with nested lists better and badly formed
Brett Parker [Mon, 7 Jan 2008 01:01:38 +0000 (01:01 +0000)]
* Update list handling code to deal with nested lists better and badly formed
  html

12 years ago* unittest for mixture of different types of lists
Brett Parker [Mon, 7 Jan 2008 01:00:44 +0000 (01:00 +0000)]
* unittest for mixture of different types of lists

12 years ago* Serious reworking of HTML2Text to handle nested lists reasonably
Brett Parker [Sun, 6 Jan 2008 22:39:04 +0000 (22:39 +0000)]
* Serious reworking of HTML2Text to handle nested lists reasonably
* Adding more unittests for the nested lists

12 years ago* Small improvements to the HTML2Text code
Brett Parker [Sun, 6 Jan 2008 11:43:44 +0000 (11:43 +0000)]
* Small improvements to the HTML2Text code
* Reorganize unittests for parsing to make it easier to add more tests later

12 years ago* serious reworking of the HTML2Text parser
Brett Parker [Sat, 5 Jan 2008 21:06:27 +0000 (21:06 +0000)]
* serious reworking of the HTML2Text parser

12 years ago* fix README to have a more complete config example
Brett Parker [Sat, 5 Jan 2008 17:00:57 +0000 (17:00 +0000)]
* fix README to have a more complete config example
* stop text width from being hardcoded

12 years ago* add missing source files for unit tests
Brett Parker [Sat, 5 Jan 2008 15:49:44 +0000 (15:49 +0000)]
* add missing source files for unit tests
* small fix to paragraph handling

12 years ago* add (first draft of) full test suite runner
Brett Parker [Sat, 5 Jan 2008 13:00:48 +0000 (13:00 +0000)]
* add (first draft of) full test suite runner
* add test for well formed paragraph handling
* update UnorderedListTests to have better test naming scheme
* add suite function to UnorderedListTests

12 years agoUpdate li handling a bit, and make the expected test results be what we'd
Brett Parker [Sat, 5 Jan 2008 10:06:32 +0000 (10:06 +0000)]
Update li handling a bit, and make the expected test results be what we'd
actually want (previous version might have been a bit of a work around)

12 years ago* Move some of the list handling above the paragraph handling so that it
Brett Parker [Mon, 31 Dec 2007 03:08:57 +0000 (03:08 +0000)]
* Move some of the list handling above the paragraph handling so that it
  doesn't get confused (bless it!)
* Make expected output match actual output, unittest now passes

12 years ago* Add unit test for some of the badly formed lists that we get after the
Brett Parker [Mon, 31 Dec 2007 02:56:28 +0000 (02:56 +0000)]
* Add unit test for some of the badly formed lists that we get after the
  feedparser "sanitizer" has a word with the HTML (currently fails)

12 years ago* Add unittest for unordered list
Brett Parker [Mon, 24 Dec 2007 11:38:12 +0000 (11:38 +0000)]
* Add unittest for unordered list
* make sure that the string that we use for plain text always ends in a new
  line character

12 years agoReformat code ready for adding test suite
Brett Parker [Mon, 24 Dec 2007 08:15:34 +0000 (08:15 +0000)]
Reformat code ready for adding test suite

12 years ago* Update TODO list with further escaping needs
Brett Parker [Sat, 22 Dec 2007 22:08:55 +0000 (22:08 +0000)]
* Update TODO list with further escaping needs

12 years ago* Add item url to html parts
Brett Parker [Sat, 22 Dec 2007 20:27:32 +0000 (20:27 +0000)]
* Add item url to html parts

12 years ago* Add item url to bottom of text only part
Brett Parker [Sat, 22 Dec 2007 19:33:11 +0000 (19:33 +0000)]
* Add item url to bottom of text only part

12 years agoUpdate TODO list
Brett Parker [Sat, 22 Dec 2007 18:33:09 +0000 (18:33 +0000)]
Update TODO list

12 years ago* multiple posts with the same link but different guid support - still
Brett Parker [Sat, 22 Dec 2007 18:32:52 +0000 (18:32 +0000)]
* multiple posts with the same link but different guid support - still
  threaded, but don't keep delivering the same messages everytime until they
  leave the feed

12 years ago* Add redirect support
Brett Parker [Sat, 22 Dec 2007 01:02:37 +0000 (01:02 +0000)]
* Add redirect support
* Try to get a URL 3 times (redirects are included in the count...)
* Refactor connection creation in to it's own function to lower duplication of
  code

12 years agoUpdate TODO list
Brett Parker [Sat, 22 Dec 2007 00:09:45 +0000 (00:09 +0000)]
Update TODO list

12 years ago* fix typo for a particular entity
Brett Parker [Fri, 21 Dec 2007 22:05:23 +0000 (22:05 +0000)]
* fix typo for a particular entity

12 years ago* Add a prelimanary todo list
Brett Parker [Fri, 21 Dec 2007 21:29:38 +0000 (21:29 +0000)]
* Add a prelimanary todo list

12 years ago* be slightly more forgiving on connection resets
Brett Parker [Fri, 21 Dec 2007 21:29:17 +0000 (21:29 +0000)]
* be slightly more forgiving on connection resets
* if there's no date in the feed, use todays date/time

12 years agoOnly download feeds that have changed (or that don't give us enough data to
Brett Parker [Fri, 21 Dec 2007 20:52:11 +0000 (20:52 +0000)]
Only download feeds that have changed (or that don't give us enough data to
workout if they've changed without downloading it all anyways)

12 years agoRefactor <br /> handling code so that there's no duplication
Brett Parker [Fri, 21 Dec 2007 19:14:29 +0000 (19:14 +0000)]
Refactor <br /> handling code so that there's no duplication

12 years ago* updated posts are now "threaded" - adds a References header with the previous
Brett Parker [Fri, 21 Dec 2007 18:51:40 +0000 (18:51 +0000)]
* updated posts are now "threaded" - adds a References header with the previous
  message-id in it, then adds the previous message id to the current message-id
  so that further updates can reference that properly

12 years agoFurther reformatting to < 80 chars per line
Brett Parker [Fri, 21 Dec 2007 16:03:40 +0000 (16:03 +0000)]
Further reformatting to < 80 chars per line

12 years ago* improve handling of unicode data
Brett Parker [Fri, 21 Dec 2007 15:40:51 +0000 (15:40 +0000)]
* improve handling of unicode data

12 years ago* tidy code to be mostly < 80 chars per line
Brett Parker [Fri, 21 Dec 2007 15:14:22 +0000 (15:14 +0000)]
* tidy code to be mostly < 80 chars per line
* add unordered list support
* tidy paragraph handling code to work better

12 years agoRemove references to mailbox module (doesn't let you write to maildir, which is
Brett Parker [Fri, 21 Dec 2007 13:29:17 +0000 (13:29 +0000)]
Remove references to mailbox module (doesn't let you write to maildir, which is
what we want, until python 2.5)

12 years agobetter utf-8 handling (though, we currently don't take in to account what
Brett Parker [Fri, 21 Dec 2007 13:26:13 +0000 (13:26 +0000)]
better utf-8 handling (though, we currently don't take in to account what
encoding we should be handling, so this could be "interesting" at best)

12 years ago* improved entity handling
Brett Parker [Fri, 21 Dec 2007 00:31:37 +0000 (00:31 +0000)]
* improved entity handling

12 years ago* fix blockquote support
Brett Parker [Thu, 20 Dec 2007 23:55:32 +0000 (23:55 +0000)]
* fix blockquote support
* improve headings support
* add pre support

12 years agoFix documentation up a bit
Brett Parker [Thu, 20 Dec 2007 23:10:04 +0000 (23:10 +0000)]
Fix documentation up a bit

12 years ago* blockquote support - indents a blockquote with a "> "
Brett Parker [Thu, 20 Dec 2007 23:05:54 +0000 (23:05 +0000)]
* blockquote support - indents a blockquote with a "> "

12 years ago* make db key actually unique for feed url + link url
Brett Parker [Thu, 20 Dec 2007 22:13:08 +0000 (22:13 +0000)]
* make db key actually unique for feed url + link url

12 years ago* add support for &nbsp;
Brett Parker [Thu, 20 Dec 2007 22:03:01 +0000 (22:03 +0000)]
* add support for &nbsp;
* add text wrapping for paragraphs (this is going to need more work, really)

12 years agoAdd licence information
Brett Parker [Thu, 20 Dec 2007 21:16:31 +0000 (21:16 +0000)]
Add licence information

12 years agoAdd basic HTML -> plain text parser
Brett Parker [Thu, 20 Dec 2007 21:03:09 +0000 (21:03 +0000)]
Add basic HTML -> plain text parser

12 years agoUpdate example file with planet alug and planet debian
Brett Parker [Thu, 20 Dec 2007 19:30:17 +0000 (19:30 +0000)]
Update example file with planet alug and planet debian

12 years ago* make mail messages multipart/alternative messages with a text/plain and
Brett Parker [Thu, 20 Dec 2007 19:29:54 +0000 (19:29 +0000)]
* make mail messages multipart/alternative messages with a text/plain and
  text/html part
* create a seen database that logs wether or not we've seen an item before by
  using the url of the item as the key, then check the md5sum to see if we need
  to see that item anyways

12 years agoAdd a Message-ID header and set the type to the type of the content in the rss
Brett Parker [Thu, 20 Dec 2007 14:14:51 +0000 (14:14 +0000)]
Add a Message-ID header and set the type to the type of the content in the rss
feed.

12 years agoMuch better filename creation for the tmp file
Brett Parker [Thu, 20 Dec 2007 01:14:02 +0000 (01:14 +0000)]
Much better filename creation for the tmp file

12 years ago* Parsing of the RSS feed using feedparser
Brett Parker [Wed, 19 Dec 2007 20:09:30 +0000 (20:09 +0000)]
* Parsing of the RSS feed using feedparser
* Creation of files for the maildir

12 years agoCreate Maildirs and Maildir root if possible for the feeds
Brett Parker [Wed, 19 Dec 2007 14:40:35 +0000 (14:40 +0000)]
Create Maildirs and Maildir root if possible for the feeds

12 years agoStarting point of rss2maildir:
Brett Parker [Wed, 19 Dec 2007 10:58:03 +0000 (10:58 +0000)]
Starting point of rss2maildir:
    * Config parser
    * Options parser