Sunday, 30 October 2011

web 1.0



Evaluating and Managing Information Technologies


When the Internet was first born as a military electronic communication project in the 1960’s no one though that the Internet along with the services that runs over its infrastructure will become a part of our life. The easiest way to think of the internet is a network of networks that consists of millions of private, public, academic, business, and government networks (Whittaker, 2002)

Services as the WWW which has started early 1990’s allowing hyperlinking millions of academic documents to be accessed remotely, the WWW developed since then and become a way of delivery not just academic but all kind of multimedia documents. 

the mechanism of client-server architecture is what the WWW is based on, with the architecture a group of servers providing different or similar services listen to the requests sent by clients and responds back with the information or service request. For client to request a resource on the Internet we needed to the use methods to locate such resource in this case the Unified Resource Locator (URL) was introduced with the following construction

http://asma-library.blogspot.com/2011/10/school-of-informatics-msc-library.html

The above URL requests the html page school-of-informatics-msc-library located in older ‘10’ within folder 2011 located on the server hosting the blog ‘asma-library.blogspot.com’ knowing that these folder are on relative not a physical path on the hosting server due to reasons related to security.

 A database is a means of storing and retrieving structured data that is centrally located and managed in order to provide efficient access to pertinent information (Connolly, Thomas M., 2010)

Databases are represented logically using a relational model consisting of a two-dimensional grid containing the data, with rows representing the individual records and columns represent one piece of that record, for example a record for an employee will contain ‘Salary’ as a column name. One method of avoiding duplication of records is using keys, called primary keys, foreign keys are used to link tables together to represent the relation between them.
Designing a database starts with the ER model, which then can be translated into a database tables. Database tables should only represent one and only one entity, choosing the right primary key.

Structure Query Language (SQL) is the language used to manage the DBMS, it is a command line based language which has to be finished with a semi-colon and a return key.

There are two main types of search models, the exact match and the best mach. Both models are used to get the relevant documents related to a search made my users.
The exact match is based on Boolean logic using AND, OR, and NOT operators, the existence of two elements is required for Boolean operators to function.
The best match doesn’t require any operators, it allows the users to ender a natural language query, and the result of the search is a ranked list of documents.

With the addition of query modification either manual or automatic, the users can achieve better results.

Choosing the right retrieval system is important to get the relevant results as the request information type varies from data, images, audio etc.

Information retrieval or search plays an important role in a wide range of information management and electronic commerce tasks. On the internet, search portals like Google, Yahoo! and MSN Search are among the most popular destinations,( Belkin, Nicholas, 2004) Information retrieval is a systems oriented view of information seeking. It deals with unstructured information and focuses on the ranked recovery of documents from archives by the probabilistic matching of document contents with natural language requests using relevance as a test. (Macfarlane 2011)

"SQL was one of the first languages for Edgar F. Codd's relational model in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks"and became the most widely used language for relational databases." (Today, the relational model is the dominant data model as well as the foundation for the leading DBMS products, which include IBM’s DB2 family, Informix, Oracle, Sybase, Microsoft’s Access and SQLServer, as well as and Paradox. RDBMS represent close to a multibillion-dollar industry alone." (http://www.aspfree.com/c/a/Database/Introduction-to-RDBMS-OODBMS-and-ORDBMs)

we can use the Navigational deals with finding home pages, or transactional queries to find some service in order to finding any service for example to buy a self-case  in UK.

I understand between the differences IR and DBMS. IR searches information Retrieval
 that is related to the user; also another user use   the same search but be finding  for a very different result,  Databases search for an exact matching  in an indexed archive,  subject  relevant  the answer.


Managing data with appropriate information technologies in an efficient and professional manner that draws upon a critical knowledge of the nature and constraints of digital information

The Hypertext Mark-up Language (HTML) was the first language used to mark-up the WWW, it allowed linking between documents all over the world. A web browser that resides at the client side views the pages created with HTML. This browser is responsible to send the requested URL to the server and then interpret the response sent by the server to display it in a readable format to the user.

The HTML pages can be created using the a simple text editor, to be able to do that a knowledge of HTML coding is required, it is all text based coding but saved with the extension ‘.html’. The HTML documents is built using ‘Tags’ these tags typically comes in pairs and enclosed in “less than/greater than” symbols. Tags are used to add metadata, format layout, add styles and structure the page according to the designer needs, there are common tags between browsers but some tags are browser specific.

One important component of any HTML page is images, to add an image with the format .jpg or .gif the <img> tag is used, this tag can be customized to display the picture the way we want by using the width and height parameters, add a border and create a hyperlink associated with it.

With the need to add the same look and feel to all pages without adding and repeating styling tags, and the fact that to change the design of a page will require changing the tags within the HTML for the whole site, the style sheets were introduced. These sheets would hold the format and page layout information needed by the browser; the style sheet is added by hyperlink to the style sheet. A style sheet can be linked to multiple pages but it is loaded once.

Applying the CSS to an html page created with notepad helped me to see the changes and the way the page is presented, some of the tutorials found on www.w3schools.com assisted me with understanding the basics of HTML programming.

With the vast increase of data and information, and the need to be able to access and retrieve them in a quick and efficient way applying security at the same time, databases approach was used to avoid the duplication caused by using file approach of saving data with DBMS to handle the requests between the user and the data itself.

Once the database table is created using the ‘Create table’ command, the data is filled into the table using the ‘insert into’ command. Querying the database is the main action afterwards with an extensive use of the ‘Select’ statement, which is considered the heart of SQL.

Various query statement can be used to select and display the records, these records can be displayed from a single table based on specific search criteria by the help of logical operators or can be joined select statement to show the data from two related tables.

Information retrieval of non-structured information depends mainly on indexing, which can be achieved by 3-4 steps.

Identify the fields such as Author, title, date etc to maximize accuracy

Removing Stop words by slitting up text with blank spaces

Stemming by removing suffixes e.g. water, waters, watered and watering as they all lead to one meaning.

Synonyms, specifying a list of similar meaning words will increase the relevant results.


Conclusions

The lectures have given a clearly to the different strands of digital information technologies, Web search allows much more flexibility and access to information using natural language,  the  relevant' highlights the key difference in Information Retrieval on the web and a database. The same query will be used to provide the same results, I understanding the more knowledge is gained information sources for future searches.

References


1.   http://asma-library.blogspot.com/ [Accessed 30 Oct 2011]

2.   Whitaker, Jason, 2002. The internet : the basic.

3.   MacFarlene, A. Butterworth, R. 2010. Lecture 01: Introduction to Computers and Digital Information. INM348 Digital Information Technologies and Architectures [online via VLE] through: Moodle [Accessed 26 Sep 2011]

4.   Connolly, Thomas M. 2010. Databases system : a practical approach to design, implementation, and management.



5.   MacFarlene, A. Butterworth, R. Dykes, J. 2010. Lecture 02: The Internet and the World Wide Web. INM348 Digital Information Technologies and Architectures [online via VLE] through: Moodle [Accessed 3 Oct 2011]

6.   MacFarlene, A. Butterworth, R. Krause, A. 2010. Lecture 03: Structuring and querying information stored in databases. INM348 Digital Information Technologies and Architectures [online vie VLE] through: Moodle [Accessed 10 Oct 2011]



7.   MacFarlene, A. 2010. Lecture 04: Information retrieval. INM348 Digital Information Technologies and Architectures [online via VLE] Available through: Moodle [Accessed 17 Oct 2011]

8.   Belkin, Nicholas. 2004 Evaluating Interactive Information Retrieval Systems: Opportunities and Challenges. [Online] Available http://research.microsoft.com/en-us/um/people/sdumais/chi04-sig-searcheval-final.pdf

9.   Refsnes, J. E. (2006), A Beginner's Guide to HTML. [online] Available at http://www.w3schools.com [Accessed 21 Oct 2011]




Saturday, 1 October 2011

School of Informatics MSc Library Sciences

Digital Information Technologies & Architectures









Session 01:

An Introduction to computing

Levels of Data Representation
◦ Bits

◦ Bytes

◦ Formats

◦ Files

◦ Documents

Coursework


What I Learned  in the first lab:

I have had experience with HTML,to editing and  creating HTML using the programs Microsoft Word or Notepad .
the point of this lab was to demonstrate the relationship between binary and ASCII, as well how different forms of media (in this instance, an image) is displayed within different formats of documents.
Also, know the importance of using your image . By choosing "link" image and inserting it into your word file as a link, and saving it as an HTML file

In the lab class we did practical exercises to show the releveance of ASCII file extensions and the differences between saving files as HTML, rich text etc.
We learned how to embed a link to an image rather then the image itself.
This lecture gave me an overview of the focus of the course - Information rather the IT.
I can now distinguish between the different conceptual levels at which data is represented, managed and stored.
I understand the terminology of bits, bytes, files and documents.
  • A file extension is a few characters on the end the end of a file's name, after a full stop.
· .txt -- a text file, consisting of ASCII characters
· .doc -- a Microsoft Word document
· .gif, .jpg, .png -- three different image formats commonly used in web pages,
· .html -- a web page
· .exe -- a program on Windows computers
· .mp3 -- a sound file
· .mpeg -- a video file



Session 02:

How to create HTML code for your Website

Basic Webpage page. Every Webpage must contain the following code to begin and end a Webpage.

<html> - this specifies where the html code begins and ends
<head> - this contains information which is not displayed, and is useful for metadata
<title> - a fairly self explanatory one, this changes what you see at the top of your browser window
<body> - this begins and ends all the stuff in the web page

<HTML>
<HEAD>
<TITLE>Your Title Goes Here</TITLE>
</HEAD>
<BODY>
All text, image files, sound files for your page are placed between the start <BODY> and end </BODY> tags.
</BODY>
</HTML>
Each one of those is called a tag.
redcheck HTML Tags. In HTML, a tag tells the browser what to do. When you create an html page, you use tags for many reasons - to change the appearance of text, to display a graphic, or to create a link to another page. The tags you write are not visible on the browser, but their effects are noticeable.

  • Tags begin with the symbol < > and end with </ >. Tags usually come in pairs, one to start an action and one that ends it.






Session 03:
Structuring and querying information stored in databases



·         Database Management System (DBMS).
·         A database can therefore be defined as an integrated collection of data shareable between users and application systems.
  • The rows in the table represent individual records containing all of the information for one thing.
·         The columns (sometimes called fields or attributes) represent one individual piece of information about a thing.
  • One of these attributes is usually chosen to uniquely identify that row in the table.
·         Database design usually starts with an 'entity relationship'
·         In a good information system data should be stored in databases, then exported into spread sheets to be turned into bar charts and the like.
     SQL: for 'Structured Query Language
  • create table tablename ( column1, column2, ... comma separated list of the columns in the table.
  • Each column is defined by at least its name and the type of information to be stored in it.
·         the list of values to be inserted into the table must be in the same order that the columns were created in the create table command, and that text must appear between quotes.
  • SELECT fields(s)
  • FROM table(s)
  • Multiple fields should be separated by commas in the SELECT part of the statement.
·         SELECT *
·         FROM table - selects all.
·         Filter use WHERE clause.
·         Use comparison operators.
  = equal to
   < less than
      > greater than
     <= less than or equal to
  >= greater than or equal to
      <> not equal to
  • We can perform searches on multiple criteria using the following logical operators.
AND
OR
NOT
In the lab, I queried the biblio database:

I found this and difficult because,   I have never worked with databases before; I need to more practices  to learning about SQL.
I am aware of the key elements of the Relational Model.
I have a general overview of database design
I know how to Querying data bases
 SQL; HTML are the new to me seems to be easier to understand.
 
Session 04: Information Retrieval



20  queries were used to evaluate the relevance of results returned using querying using both natural language searches and then combine natural language with Boolean logic operators and quotes (AND OR ""), first using www.google.co.uk and then www.bing.com.



The 3 main types of information need:

Navigational queries: finding a home page.
Transactional queries: searching for a service in order to make a transaction Informational queries: to satisfy a need for information



1-Find the website for Oxford University.

The words I used were simply "oxford university", and the first search result that popped up (out of 696.000 results in 0.25 seconds!)

When I used Bing, using the same search term "oxford university that popped up (out of 142.000. results in 0.18 seconds!) used navigational query



2- 2. Where can I buy bookcases in the UK?

We used transactional query, I went to Google and first typed the words "bookcases sale UK", and came up with "about 1,230,000 results in 0.22 seconds".

into Bing, their first result was actually the advert result for Google we came up 69.000 results


3. Find sites were you can buy car insurance?

 it is , transactional query, I was using google.uk, that I was in the UK. So they only returned car insurance companies in the UK in my results were 6.690 in 0.23 sec

I entered the same search into Bing I find 69,300,000 results in 20 sec



4-Who is the president of Uruguay? Can you find a biography of them?

This information is an informational query

In Google gave me Wikipedia as their first three results the results were 59.000 in 0.22 sec

I went off to Bing to type in the same search, "urugay president". The first result listed were also the Wikipedia

When I searched the  Biography

I went to Google and used ' "Uruguay president" AND "biography "I find in Google 1700 results  I find in Bing 177 results in first page



5.Who was "Captain Swing" and what role did he play in the early 19th century in English history?

This is  informational query when I Google "captain swing English history I find second results , so I modification the query  into Boolean query form "captain swing" AND "English history"

, and typed into Bing '’ I find in Wikipedia article as my second  result Bing produced the same top result Wikipedia .
 

 

Overall when the average precision for all search types were:



Using Natural Language: Google:88% Bing:88%

Boolean operators: Google:70% Bing:71%



Google (no. of queries)                          


Navigational 90%
Transactional 85%

Informational 76.00% 78.00%



Bing (no. of queries)
Navigational 90%

Transactional 85%
Informational 88%



Conclusions:

In general I found Google to have better precision then Bing when using both natural language and Boolean queries.

The queries covered varying needs such as transactional and navigational.

I an understanding of the IR points of view; users, sources and systems