]>
git.sommitrealweird.co.uk Git - rss2maildir.git/log
summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Brett Parker [Tue, 27 Feb 2018 10:21:46 +0000 (10:21 +0000)]
Patch from MJ Ray for items without link
- link might not always exist, md5sum is always present, don't fail on no link.
Brett Parker [Sun, 25 Aug 2013 09:26:16 +0000 (10:26 +0100)]
Make sure that we feed the parser unicode data
- Check if we get a unicode string back from feedparser
- If not, re-encode so that it is
- Remove explicit casting to utf-8 before it goes to the parser
Brett Parker [Tue, 30 Apr 2013 20:17:40 +0000 (21:17 +0100)]
More utf-8 handling for images
- iff we get a string object rather than a unicode one, decode it from utf-8
Brett Parker [Mon, 11 Feb 2013 09:41:39 +0000 (09:41 +0000)]
Stop trying to decode image tags to utf-8
- If the title/url contains a utf-8 character and we try to decode it it
will fail due to not being present in the ascii set. Feedparser has already
made sure that everything is utf-8 before we get it.
Brett Parker [Sun, 2 Oct 2011 18:28:45 +0000 (19:28 +0100)]
Change header encoding for From/To address to make sure they're utf-8 and so they are not invalidly encoded later.
Bug reported by Andre Klärner with pointers to what was going wrong - many thanks!
Brett Parker [Sat, 23 Jul 2011 12:06:47 +0000 (13:06 +0100)]
* Add https support (thanks to Andre Klärner)
Brett Parker [Sat, 5 Mar 2011 18:44:07 +0000 (18:44 +0000)]
Update so that we don't get a warning in python 2.6 and above, where the md5 module has been depricated
Brett Parker [Mon, 14 Sep 2009 11:47:14 +0000 (12:47 +0100)]
Fix silly typo
Brett Parker [Mon, 14 Sep 2009 11:15:04 +0000 (12:15 +0100)]
Add item fetched date header (X-rss2maildir-rundate)
Brett Parker [Fri, 12 Jun 2009 10:00:33 +0000 (11:00 +0100)]
guid might not always exist, link is always present, don't fail on no guid.
Brett Parker [Fri, 12 Jun 2009 09:55:18 +0000 (10:55 +0100)]
Fix bug when link/guid contains characters not in ascii by encoding the keys as utf-8
Brett Parker [Tue, 17 Mar 2009 12:08:18 +0000 (12:08 +0000)]
Small fix to title handling code to deal with unicode better
Brett Parker [Sat, 7 Jun 2008 11:59:47 +0000 (12:59 +0100)]
Fix for items that actually have no content.
Brett Parker [Thu, 17 Apr 2008 07:20:55 +0000 (08:20 +0100)]
Fix typo in previous charref fix
Brett Parker [Wed, 16 Apr 2008 22:45:07 +0000 (23:45 +0100)]
Fix bug in character reference handling code
Brett Parker [Wed, 5 Mar 2008 10:05:16 +0000 (10:05 +0000)]
Fix for title parsing
Brett Parker [Mon, 3 Mar 2008 15:08:11 +0000 (15:08 +0000)]
Fix some entity handling
* fixes handling of numeric entities
* fixes unittest for entities.
Brett Parker [Mon, 3 Mar 2008 13:51:15 +0000 (13:51 +0000)]
Another blockquote fix
Brett Parker [Mon, 3 Mar 2008 13:28:26 +0000 (13:28 +0000)]
Fix blockquote support
Brett Parker [Mon, 3 Mar 2008 11:58:08 +0000 (11:58 +0000)]
Fix issue with images having the same alt value but different urls
Brett Parker [Sun, 2 Mar 2008 21:22:25 +0000 (21:22 +0000)]
Update changelog
Brett Parker [Sun, 2 Mar 2008 19:41:50 +0000 (19:41 +0000)]
More entities
Brett Parker [Sun, 2 Mar 2008 19:17:37 +0000 (19:17 +0000)]
small fix to put images on seperate lines
Brett Parker [Sun, 2 Mar 2008 19:02:23 +0000 (19:02 +0000)]
Small fixes to list handling code
Brett Parker [Sun, 2 Mar 2008 15:29:33 +0000 (15:29 +0000)]
simple typo fix
Brett Parker [Sun, 2 Mar 2008 13:21:48 +0000 (13:21 +0000)]
Update TODO list
Brett Parker [Sun, 2 Mar 2008 12:27:13 +0000 (12:27 +0000)]
Entity handling fixes
* Make entities case sensitive
* Add unittest for simple check of entities
* Add escaping of subject line
Brett Parker [Sun, 2 Mar 2008 12:11:25 +0000 (12:11 +0000)]
Add (lots) more basic HTML entities.
Brett Parker [Sun, 2 Mar 2008 01:12:39 +0000 (01:12 +0000)]
fix silly regression on pre formatting
Brett Parker [Sat, 1 Mar 2008 22:16:46 +0000 (22:16 +0000)]
change images to ReST format
Brett Parker [Sat, 1 Mar 2008 20:57:10 +0000 (20:57 +0000)]
Normalise spaces where they should be.
Brett Parker [Fri, 25 Jan 2008 08:31:38 +0000 (08:31 +0000)]
Unicode handling of URLs fix
Brett Parker [Wed, 16 Jan 2008 21:40:43 +0000 (21:40 +0000)]
More unicode fixes
Brett Parker [Sun, 13 Jan 2008 21:47:27 +0000 (21:47 +0000)]
* Fix bad check on state directory
Brett Parker [Sun, 13 Jan 2008 21:02:24 +0000 (21:02 +0000)]
* Begin fixes to list handling code - there's 2 unittests that are failing due
to this. (Previous revision had 7 unittests fail - bother)
Brett Parker [Sun, 13 Jan 2008 16:12:16 +0000 (16:12 +0000)]
* Small Unicode fix for img tags.
Brett Parker [Sat, 12 Jan 2008 17:08:03 +0000 (17:08 +0000)]
* Change all entity refs in to unicode strings
* Update <br> handling to be more effective
* Ignore unknown tags and just pretend they're part of the flow
* Add <img> support (very basic!)
Brett Parker [Thu, 10 Jan 2008 20:27:31 +0000 (20:27 +0000)]
Update TODO list
Brett Parker [Thu, 10 Jan 2008 20:12:52 +0000 (20:12 +0000)]
Rudimentary <a href="...">bleep</a> support.
Brett Parker [Thu, 10 Jan 2008 18:23:17 +0000 (18:23 +0000)]
* Handle unicode data more effectively.
Brett Parker [Thu, 10 Jan 2008 18:08:23 +0000 (18:08 +0000)]
Fix typo/thinko in handle_startendtag
Brett Parker [Mon, 7 Jan 2008 01:03:37 +0000 (01:03 +0000)]
* Update TODO list
Brett Parker [Mon, 7 Jan 2008 01:01:38 +0000 (01:01 +0000)]
* Update list handling code to deal with nested lists better and badly formed
html
Brett Parker [Mon, 7 Jan 2008 01:00:44 +0000 (01:00 +0000)]
* unittest for mixture of different types of lists
Brett Parker [Sun, 6 Jan 2008 22:39:04 +0000 (22:39 +0000)]
* Serious reworking of HTML2Text to handle nested lists reasonably
* Adding more unittests for the nested lists
Brett Parker [Sun, 6 Jan 2008 11:43:44 +0000 (11:43 +0000)]
* Small improvements to the HTML2Text code
* Reorganize unittests for parsing to make it easier to add more tests later
Brett Parker [Sat, 5 Jan 2008 21:06:27 +0000 (21:06 +0000)]
* serious reworking of the HTML2Text parser
Brett Parker [Sat, 5 Jan 2008 17:00:57 +0000 (17:00 +0000)]
* fix README to have a more complete config example
* stop text width from being hardcoded
Brett Parker [Sat, 5 Jan 2008 15:49:44 +0000 (15:49 +0000)]
* add missing source files for unit tests
* small fix to paragraph handling
Brett Parker [Sat, 5 Jan 2008 13:00:48 +0000 (13:00 +0000)]
* add (first draft of) full test suite runner
* add test for well formed paragraph handling
* update UnorderedListTests to have better test naming scheme
* add suite function to UnorderedListTests
Brett Parker [Sat, 5 Jan 2008 10:06:32 +0000 (10:06 +0000)]
Update li handling a bit, and make the expected test results be what we'd
actually want (previous version might have been a bit of a work around)
Brett Parker [Mon, 31 Dec 2007 03:08:57 +0000 (03:08 +0000)]
* Move some of the list handling above the paragraph handling so that it
doesn't get confused (bless it!)
* Make expected output match actual output, unittest now passes
Brett Parker [Mon, 31 Dec 2007 02:56:28 +0000 (02:56 +0000)]
* Add unit test for some of the badly formed lists that we get after the
feedparser "sanitizer" has a word with the HTML (currently fails)
Brett Parker [Mon, 24 Dec 2007 11:38:12 +0000 (11:38 +0000)]
* Add unittest for unordered list
* make sure that the string that we use for plain text always ends in a new
line character
Brett Parker [Mon, 24 Dec 2007 08:15:34 +0000 (08:15 +0000)]
Reformat code ready for adding test suite
Brett Parker [Sat, 22 Dec 2007 22:08:55 +0000 (22:08 +0000)]
* Update TODO list with further escaping needs
Brett Parker [Sat, 22 Dec 2007 20:27:32 +0000 (20:27 +0000)]
* Add item url to html parts
Brett Parker [Sat, 22 Dec 2007 19:33:11 +0000 (19:33 +0000)]
* Add item url to bottom of text only part
Brett Parker [Sat, 22 Dec 2007 18:33:09 +0000 (18:33 +0000)]
Update TODO list
Brett Parker [Sat, 22 Dec 2007 18:32:52 +0000 (18:32 +0000)]
* multiple posts with the same link but different guid support - still
threaded, but don't keep delivering the same messages everytime until they
leave the feed
Brett Parker [Sat, 22 Dec 2007 01:02:37 +0000 (01:02 +0000)]
* Add redirect support
* Try to get a URL 3 times (redirects are included in the count...)
* Refactor connection creation in to it's own function to lower duplication of
code
Brett Parker [Sat, 22 Dec 2007 00:09:45 +0000 (00:09 +0000)]
Update TODO list
Brett Parker [Fri, 21 Dec 2007 22:05:23 +0000 (22:05 +0000)]
* fix typo for a particular entity
Brett Parker [Fri, 21 Dec 2007 21:29:38 +0000 (21:29 +0000)]
* Add a prelimanary todo list
Brett Parker [Fri, 21 Dec 2007 21:29:17 +0000 (21:29 +0000)]
* be slightly more forgiving on connection resets
* if there's no date in the feed, use todays date/time
Brett Parker [Fri, 21 Dec 2007 20:52:11 +0000 (20:52 +0000)]
Only download feeds that have changed (or that don't give us enough data to
workout if they've changed without downloading it all anyways)
Brett Parker [Fri, 21 Dec 2007 19:14:29 +0000 (19:14 +0000)]
Refactor <br /> handling code so that there's no duplication
Brett Parker [Fri, 21 Dec 2007 18:51:40 +0000 (18:51 +0000)]
* updated posts are now "threaded" - adds a References header with the previous
message-id in it, then adds the previous message id to the current message-id
so that further updates can reference that properly
Brett Parker [Fri, 21 Dec 2007 16:03:40 +0000 (16:03 +0000)]
Further reformatting to < 80 chars per line
Brett Parker [Fri, 21 Dec 2007 15:40:51 +0000 (15:40 +0000)]
* improve handling of unicode data
Brett Parker [Fri, 21 Dec 2007 15:14:22 +0000 (15:14 +0000)]
* tidy code to be mostly < 80 chars per line
* add unordered list support
* tidy paragraph handling code to work better
Brett Parker [Fri, 21 Dec 2007 13:29:17 +0000 (13:29 +0000)]
Remove references to mailbox module (doesn't let you write to maildir, which is
what we want, until python 2.5)
Brett Parker [Fri, 21 Dec 2007 13:26:13 +0000 (13:26 +0000)]
better utf-8 handling (though, we currently don't take in to account what
encoding we should be handling, so this could be "interesting" at best)
Brett Parker [Fri, 21 Dec 2007 00:31:37 +0000 (00:31 +0000)]
* improved entity handling
Brett Parker [Thu, 20 Dec 2007 23:55:32 +0000 (23:55 +0000)]
* fix blockquote support
* improve headings support
* add pre support
Brett Parker [Thu, 20 Dec 2007 23:10:04 +0000 (23:10 +0000)]
Fix documentation up a bit
Brett Parker [Thu, 20 Dec 2007 23:05:54 +0000 (23:05 +0000)]
* blockquote support - indents a blockquote with a "> "
Brett Parker [Thu, 20 Dec 2007 22:13:08 +0000 (22:13 +0000)]
* make db key actually unique for feed url + link url
Brett Parker [Thu, 20 Dec 2007 22:03:01 +0000 (22:03 +0000)]
* add support for
* add text wrapping for paragraphs (this is going to need more work, really)
Brett Parker [Thu, 20 Dec 2007 21:16:31 +0000 (21:16 +0000)]
Add licence information
Brett Parker [Thu, 20 Dec 2007 21:03:09 +0000 (21:03 +0000)]
Add basic HTML -> plain text parser
Brett Parker [Thu, 20 Dec 2007 19:30:17 +0000 (19:30 +0000)]
Update example file with planet alug and planet debian
Brett Parker [Thu, 20 Dec 2007 19:29:54 +0000 (19:29 +0000)]
* make mail messages multipart/alternative messages with a text/plain and
text/html part
* create a seen database that logs wether or not we've seen an item before by
using the url of the item as the key, then check the md5sum to see if we need
to see that item anyways
Brett Parker [Thu, 20 Dec 2007 14:14:51 +0000 (14:14 +0000)]
Add a Message-ID header and set the type to the type of the content in the rss
feed.
Brett Parker [Thu, 20 Dec 2007 01:14:02 +0000 (01:14 +0000)]
Much better filename creation for the tmp file
Brett Parker [Wed, 19 Dec 2007 20:09:30 +0000 (20:09 +0000)]
* Parsing of the RSS feed using feedparser
* Creation of files for the maildir
Brett Parker [Wed, 19 Dec 2007 14:40:35 +0000 (14:40 +0000)]
Create Maildirs and Maildir root if possible for the feeds
Brett Parker [Wed, 19 Dec 2007 10:58:03 +0000 (10:58 +0000)]
Starting point of rss2maildir:
* Config parser
* Options parser