Hackpads are smart collaborative documents. Join Hackpad Now.
34 days ago
Unfiled. Edited by Omshivaprakash H L 34 days ago
Omshivaprakash L Unesco Heritage WikiEditing
 
  • We have the list of all Unesco heritage sites
  • We need to translate those names to Kannada
  • We need Wiki articles for these places in Kannada 
 
How we can achieve this?
 
  • Translation drive - For names alone 
  • Bring those names on Maps
  • Create Content Translation Tool Campaign - One or two articles a day
 
 
 
 
132 days ago
Unfiled. Edited by Omshivaprakash H L , Yogesh K S 132 days ago
Omshivaprakash L Uploading OUT_OF_COPYRIGHT Books to Internet Archive & Wiki Projects
Omshivaprakash L
  • This is a part of IEG Project documentation: Grants:IEG/Growing Kannada-language Wikimedia projects with a digital library - Page on Meta
 
Introduction
 
As a part of our project at Pustaka Sanchaya, we started identifying the books which are OUT_OF_COPYRIGHT at Digital Library of India & Osmania University Digital Library. We wanted these books to be made available to all with a Kannada index. That was achieved through http://pustaka.sanchaya.net. But as we didn't host the books ourselves, we were again dependent on the government websites to provide access to actual books. 
 
Technical Barrier:
 Government websites as many of you might know work like Government offices. 
  1.  OUDL used to be available only during 10AM to 5 or 6PM IST. 
  1. DLI website used to provide books in tiff & djvu formats. People who used to find books on Pustaka Sanchaya, found it difficult to work with DLI mainly due to this reason. Those who use Windows machines never faced any issue as there was a plugin to read through djvu and other formats via browser. So, access to books found to be still limited. 
  • Update: Last we observed that DLI team started uploading books in PDF format which is most preferred. But only in April 2016, we found that Kannada books were also found in PDF on DLI website. 
 
Solution:
Solution to this problem would be to mirror the site on to a third party Internet portal which provides 24x7 access to these resources without any difficulty. 
 
Our Preference:
We preferred to use Internet archive & Wikimedia Commons as platforms where we can keep these resources. 
 
Copyright issues:
Though we found the platforms, we were sure that all the Kannada Books cannot be kept on archive or commons due to Copyright issues. We decided to pick only the OUT_OF_COPYRIGHT books and upload them here. 
 
Preparation
 
To do the bulk upload to Internet Archive, internet archive python library is used which is the  command-line interface to Archive.org. Details on installing and running this python library can be found here - https://internetarchive.readthedocs.org/en/latest/cli.html
 
Uploading to Internet Archive
 
Now we took the DB dump of Pustaka Sanchaya to re-use it for Internet Archive Upload. Following steps tell us how the data was formatted. 
 
Spreadsheet is used to feed the values for bulk upload, so prepare the spreadsheet before the upload in the below format -
 
identifier file mediatype collection title creator language description contributor date subject[0] subject[1] subject[3] licenseurl
Kirluuskara_Lakshhmanaraayaru_1921 Kirluuskara_Lakshhmanaraayaru.pdf texts opensource Kirluuskara Lakshhmanaraayaru SV Kirloskar kan ಕಿರ್ಲೋಸ್ಕರ ಲಕ್ಷ್ಮಣರಾಯರು -- ಶ್ರೀ ಶಂಕರರಾವ್ ವಾ. ಕಿರ್ಲೋಸ್ಕರ್ OUDL 1921 Kannada Old Kannada Books Scanned Kannada Books from OUDL http://creativecommons.org/publicdomain/mark/1.0/
With the above specified metadata, this book is uploaded to url - https://archive.org/details/Kirluuskara_Lakshhmanaraayaru_1921
 
But with OUDL books (need to check this for DLI), books were uploaded as Image container pdf instead of Text pdf files(this is required for book preview in archive.org and only text pdf/djvu files are recognized by IA-upload(https://tools.wmflabs.org/ia-upload). 
 
 
To overcome this issue, djvu file can be uploaded alongwith pdf file to internet archive so that djvu file can be used in wikisource later. Below is the modified spreadsheet to include djvu file - 
 
identifier file mediatype collection title creator language description contributor date subject[0] subject[1] subject[3] licenseurl
Kirluuskara_Lakshhmanaraayaru_1921 Kirluuskara_Lakshhmanaraayaru.pdf texts opensource Kirluuskara Lakshhmanaraayaru SV Kirloskar kan ಕಿರ್ಲೋಸ್ಕರ ಲಕ್ಷ್ಮಣರಾಯರು -- ಶ್ರೀ ಶಂಕರರಾವ್ ವಾ. ಕಿರ್ಲೋಸ್ಕರ್ OUDL 1921 Kannada Old Kannada Books Scanned Kannada Books from OUDL http://creativecommons.org/publicdomain/mark/1.0/
Kirluuskara_Lakshhmanaraayaru.djvu
Now the books can be uploaded with the following command
 
  • $ ia upload --spreadsheet=books.csv
 
Uploading to WikiCommons
 
Once the books are uploaded to Internet Archive, they can be uploaded to wikimedia commons using IA-upload tool. All it needs is the book identifier from Internet Archive. Once the identifier(Kirluuskara_Lakshhmanaraayaru_1921 in above example) is given to IA-upload tool, it pulls all the metadata from archive.org and pre-fills the book template to review. If all looks good, book can be uploaded by clicking on Upload. In few rare cases where the book size is greater than 60MB, IA-upload may not upload the book to commons but it generates the book template. In that case, the same book template can be used in url2commons tool to do the upload(https://tools.wmflabs.org/url2commons/index.html). Unlike IA-upload, url2commons takes the djvu file url instead of identifier (for ex https://archive.org/stream/Kirluuskara_Lakshhmanaraayaru_1921/Kirluuskara_Lakshhmanaraayaru.djvu
 
As the book is uploaded to commons, Kannada wikisource link for the book is autogenerated from the book template(Ex. Book on commons - https://commons.wikimedia.org/wiki/File:ವಂಗವಿಜೇತ.djvu). Although link is autogenerated for wikisource, the coverpage, author, publisher and other details have to be filled manually on Kannada Wikisource. Example book on Wikisource uploaded from Internet Archive to Commons - https://kn.wikisource.org/wiki/ಪರಿವಿಡಿ:ವಂಗವಿಜೇತ.djvu 
 
The steps to upload books from Archive.org to Wiki Commons & Wikisource has been documented separately here: Uploading Books to Wikimedia Commons & Kn Wikisource From Internet Archive
 
In a nutshell
 
So in a nutshell, below are the steps for a book to come to Wikisource from Internet archive -
 
  • Upload the book to Internet Archive using Internet archive python library($ ia upload --spreadsheet=books.csv) 
  • Put the book identifiers from archive.org to IA-upload tool to upload to commons.
  • If above step fails, copy the book template and use it in URL2Commons tool with direct djvu book link to upload to commons.
  • Go to uploaded file on commons and click on Wikisource link to fill the book details.
  • Fill the book details on Wikisource and save it to start proofreading.
 
Current Status
As of now, we have uploaded around 1006 Kannada books have been uploaded to Internet Archive. (215  & 791 Books are uploaded from OUDL and DLI respectively).
 
Conclusion
 
While it sounds like an easy process, it has its own difficulty for people who get involved in this process. 
 
  1. The initial metadata dump we used from DLI seem to have had a mammoth amount of wrong entries for authors & publishers and it has been changed again.
  1. Also, the transliteration & review project conducted via Samooha Sanchaya efforts will require a further review to fix the issues quoted in #1.
  1. Uploading to wiki commons from internet archive would result in error in case the book size is more than 50MB. This would require us to use URL2Commons  to resolve the issue. 
  1. Books uploaded to Internet Archive are by default available in Image PDF. It won't be accepted by ia-upload tool. Hence, the metadata has to be updated to Text PDF or upload the same book in DJVU format. See here for details on this issue .
Yogesh S
  1. Internet Archive uses 3 letter ISO code for langauges in its metadata which causes problems when this metadata is used in ia-upload tool since it accepts 2 letter ISO code.(Ex. Kannada was kan in IA while ia-upload tool assumed  this as ka(Georgian language)). This has to be corrected before uploading to Wikimedia Commons. Please refer detail steps here .
  1. Same file names are used in DLI(Ex. Hariharana-Ragalegalu.pdf  is used 4 files). and this creates filename conflicts when internetarchive python library is used to upload the books.
Omshivaprakash L Contributions:
 
 
 
132 days ago
Unfiled. Edited by Omshivaprakash H L 132 days ago
IEG: Pustaka Sanchaya 1st Meetup- Activities
Omshivaprakash L
  • This is a part of IEG Project documentation: Grants:IEG/Growing Kannada-language Wikimedia projects with a digital library - Page on Meta
 
 
  • with Yogesh, Pavithra, Tanveer, Srividya and Rahim
 
  • For "ರತ್ನನ ಪದಗಳು"
  •  
Intro Text
 
  • '''ರತ್ನನ ಪದಗಳು''' ಪುಸ್ತಕವನ್ನು '''ಜಿ. ಪಿ. ರಾಜರತ್ನಂ''' ಅವರು ೧೯೪೫ರಲ್ಲಿ ರಚಿಸಿದರು. ಇದನ್ನು '''ಸತ್ಯಶೋಧನಾ ಪ್ರಕಟಣಾ ಮಂದಿರ''' ಪ್ರಕಟಿಸಿದೆ '''<ref name="OUDL Source URL">{{cite web | url=http://oudl.osmania.ac.in/handle/OUDL/3291
  • title=ರತ್ನನ ಪದಗಳು ಪುಸ್ತಕ | publisher=OUDL}}</ref>'''.  
 
  • For coding
  • '''<<Book Name>>''' ಪುಸ್ತಕವನ್ನು '''<<Author Name>>''' ಅವರು <<Year>>ರಲ್ಲಿ ರಚಿಸಿದರು. ಇದನ್ನು '''<<Publisher>>''' ಪ್ರಕಟಿಸಿದೆ '''<ref name="<<Library>> Source URL">{{cite web | url=<<OUDL URL>> | title=<<Title>> | publisher=<<Library>>}}</ref>'''.  
 
  • {{Infobox Book
  • | name          = ರತ್ನನ ಪದಗಳು
  • | title_orig    = 
  • | translator    = 
  • | image         = 
  • | image_caption = 
  • | author        = ಜಿ. ಪಿ. ರಾಜರತ್ನಂ
  • | illustrator   = 
  • | cover_artist  = 
  • | country       = [[ಭಾರತ]]
  • | language      = [[ಕನ್ನಡ]] 
  • | series        = 
  • | subject       = 
  • | genre         = 
  • | publisher     = ಸತ್ಯಶೋಧನಾ ಪ್ರಕಟಣಾ ಮಂದಿರ
  • | pub_date      = ೧೯೪೫
  • | english_pub_date = 
  • | pages         = 
  • | isbn          = 
  • | oclc          = 
  • | preceded_by   = 
  • | followed_by   = 
  • }}
 
  • ==ಉಲ್ಲೇಖಗಳು==
  • <references />
 
  • [[ವರ್ಗ:<<category>>]] 
  • [[ವರ್ಗ:<<Author>> ಅವರ ಪುಸ್ತಕಗಳು]] 
  • [[ವರ್ಗ:ಪುಸ್ತಕಗಳು]] 
  • [[ವರ್ಗ:ಪುಸ್ತಕ ಸಂಚಯ - ಐಇಜಿ ಯೋಜನೆ]]
Yogesh S
  • [[ವರ್ಗ:<<genre>>]]
 
  • Deshakala - Kannada Pustaka Itihasa
 
Group Photo from Meetup
 
Photo by: Tanveer Hassan
 
 
Members (5)
i@thejeshgn.com Omshivaprakash H L devu.dilip@gmail.com Pavithra H namismail@gmail.com

Create a New Collection

Cancel

Move XXX to XXX


XXX will be invited to the XXX on XXX.

Cancel

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in / Sign up