Notices
Results 1 to 6 of 6
Like Tree1Likes
  • 1 Post By Darkhorse

Thread: Concatenate content from 600 sub-websites into one document?

  1. #1 Concatenate content from 600 sub-websites into one document? 
    New Member
    Join Date
    Jan 2013
    Posts
    2
    Hi all,

    I've been asked to manually copy and paste 600 pages of text from individual pages in a website into a word document, and I feel like there's got to be a less tedious way to do it. The text is individual learning modules for an online curriculum at the University where I teach... 15 modules, 30 to 40 so sub-pages for each.

    I don't know much about programming or computer science- I have studied a little Unix for software update deployment and that's it. Anyone with any ideas?

    Let me apologize for not knowing much/being new/possibly doing the forum thing wrong- My intention is just to find an elegant solution I can then pass on to other teachers tasked to do this with their online curriculum.

    Best,
    Anastasia


    Reply With Quote  
     

  2.  
     

  3. #2  
    Brassica oleracea Strange's Avatar
    Join Date
    Oct 2011
    Location
    喫茶店
    Posts
    16,670
    manually copy and paste 600 pages
    That sounds like an incredibly inefficient way of doing it. They must have access to the source files, which could be imported into Word. Even that could be painful for 600 files.

    They need someone to write a few lines of code to do this. It shouldn't take more than an hour or two. Manually, you might be at it for weeks.


    Without wishing to overstate my case, everything in the observable universe definitely has its origins in Northamptonshire -- Alan Moore
    Reply With Quote  
     

  4. #3  
    New Member
    Join Date
    Jan 2013
    Posts
    2
    Thanks for the sympathy. I think the person who will need to write this code is me, however. No idea even which language to use or where to start. Any ideas?

    I tried using a program called SiteSucker but the page needs a login and they don't support that in the version I'm using.
    Reply With Quote  
     

  5. #4  
    Brassica oleracea Strange's Avatar
    Join Date
    Oct 2011
    Location
    喫茶店
    Posts
    16,670
    Well, it wouldn't be too difficult but I'm not sure it is an easy task for a complete novice. Do you have any programming experience at all?

    I would look at using VBA (Visual Basic) within Word. Or alternatively Linux scripting and the wget command. I would love to help but I don't really have the time...
    Without wishing to overstate my case, everything in the observable universe definitely has its origins in Northamptonshire -- Alan Moore
    Reply With Quote  
     

  6. #5  
    mvb
    mvb is offline
    Thinker Emeritus
    Join Date
    Dec 2012
    Location
    Delaware, USA
    Posts
    195
    Is that really 600 different url's? If it is 600 pages distributed among a smaller number of independently accessed documents the job could be a lot simpler, even if still rather burdensome.
    Reply With Quote  
     

  7. #6  
    Iuvenis ducis Darkhorse's Avatar
    Join Date
    Mar 2011
    Posts
    105
    For a task like this a programming language called Perl is your best friend. It was developed specifically to parse and reformat text. Post a question with specific details on PerlMonks - The Monastery Gates and someone may have a script that you can use right away, or will help you create something that will work for you. Be sure to let them know if you have access to the original files or if you need to go through the web. If you don't have access to the originals then be prepared to explain exactly why you need it since they take a dim view of people using other people's material.
    Neverfly likes this.
    Ignorance more frequently begets confidence, than it does knowledge. [Charles Darwin]
    Physical laws are kinda like Pringles. It is hard to break just one law. [Dr. Rocket]
    Reply With Quote  
     

Similar Threads

  1. On which websites can I create my websites free of charge?
    By Asydrantarync in forum Introductions
    Replies: 0
    Last Post: October 11th, 2011, 02:54 PM
  2. Document Writing Software for Researchers
    By MoJo98221 in forum General Discussion
    Replies: 6
    Last Post: April 4th, 2011, 07:45 AM
  3. document examination question
    By clsmith9 in forum Criminology and Forensic Science
    Replies: 0
    Last Post: March 11th, 2009, 11:36 AM
  4. Replies: 2
    Last Post: December 18th, 2008, 03:07 PM
  5. chemical content of glowsticks
    By nerdy22 in forum Chemistry
    Replies: 3
    Last Post: December 17th, 2008, 02:43 AM
Bookmarks
Bookmarks
Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •